Friday, 17 October 2014

Your own personal genome project

I was chatting with a colleague at work who'd asked me if I know anywhere they could get their genome or exome sequenced. My genome has been sat in the freezer for over five years wanting to go onto a flow cell, but I've never been comfortable putting it on our own machines. I did get 23andMe'd a few years ago but they've closed the exome for now.

Today there are many sequencing service providers across the world. Would any of them be open to a consumer led project? How many genomes/exomes would we need to sequence to get a price consumers were willing to pay? To test the market we've used AllSeq: "the global sequencing marketplace", and a couple of replies have now come in!

Thursday, 9 October 2014

Twitterbots for NGS

I've been inspired to create three Twitterbots for NGS papers on @RNA-seq, @ChIP-seq and @Exome-seq by Casey Bergman at the University of Manchester. I'd not come across Caseys twitter account (I don't actually use Twitter that much) or his lab website and blog; but I was directed there by a piece on the Nature website...How to tame the flood of literature.

What Casey has done is pretty simple and it is very well explained in his blog post, or by Rob Lanfear who has posted instructions on GitHub. There are three simple steps for PubMed and Twitter (and more for arxiv, peerj, etc).
  1. Set up a twitter account
  2. Set up a pubmed search
  3. Set up your account
The feeds I created only went live this evening and I'll follow Robs advice to refine them over the next few weeks. Let me know if you like them, and why not create your own feed.

PS: Casey has a great post on how to host a custom UCSC genome browser trackwith Dropbox.



Tuesday, 7 October 2014

Count-down to AGBT 2015

Registration is open! The registration process for AGBT has been overhauled in the last few years; yes there's still a lot of people who don't get in, but it seems to be pretty fair. 

I hope to see you there. I'm going to be blogging and Tweeting again (assuming I get in). Say Hi if you read CoreGenomics and I'll buy you a beer (or rather grab one from Illumina, Agilent, etc)!

Thursday, 2 October 2014

Whole exomes from single cells: Fludigm C1 update

This was not a planned post but it follows on nicely from today's other one about exomes. This time I'm writing about Fluidigm's new single-cell exome-seq protocol. Yup that's right, whole exomes from 96 single cells! The C1 is an amazing piece of kit (wish I had one) and we've used it a little bit for mRNA-seq. The ability to sequence single-cell genomes and exomes means you can pretty much do whatever you want with a single-cell now. So how do the exomes look?

C1 on YouTube: Fluidigm have a video presentation from their R&D scientist Keith Szulwach who gives a walkthrough the data. They prepared exomes from the breast cancer/normal CRL-2338/2339 cell lines. These are part of the ICGC-TCGA DREAM Genomic Mutation Calling Challenge, an international effort to create standard methods for identifying cancer-induced mutations in whole-genome sequencing data. This global competition aims to find the most accurate mutation calling techniques and hopefully allow other groups to adopt standardised methods.

The exomes were sequenced to 27x coverage, and it looked like about 70% of the exome was covered (they say 90-95% but it does not look like that to me on the graph!) SNV concordance was 92%, and allelic dropout was 14%; both of these seem pretty good considering there is only one genome in the cell. I'd say it's pretty likely not to capture the whole genome in a library, and even more will be lost in the amplification and exome hybridisation.

Fluidigm claim that you can "reduce exome enrichment time by 12x", but this does not make sense to me. Our current workflow is 24 hours, but with most hybridisation capture systems being completed in 2-4 days I'd say the reduction is more like 2-4x faster.

Cell line heterogeneity: Fluidigm demonstrate the ability to detect mutations in single cells in a population, and can easily cluster tumour from normal. The data may shed light on cell line heterogeneity. In fact it opens up the question "how heterogeneous are cell lines?" I wonder if it is possible to use the 50x coverage data on the ICGC data portal for the NCI60 and CCLE cell lines to interrogate the heterogeneity of each line? We've recently been working with Horizon diagnostics who produce single-cell derived homogeneous cell lines for their genome engineered controls. They've gone to a lot of effort to get isogenic lines, but I'm not seen any published work demonstrating if the rest of us have a significant problem or not. Could we use whole genome data to look for heterogeneity in cell lines?

More on exomes

I've been finding out more about exomes: specifically QC analysis using HS Metrics in Picard. There's loads of useful metrics and I'm hoping to get to a point that I can explain these to users here and also look at the results to try and troubleshoot an experiment. I'm also trying to understand what sort of read length we should be using for exome analysis. An earlier post discussed my thoughts around moving to PE125 or switching to SE125 and running more lanes. In a follow up post (watch this space) I'll try to consider the impact of different run modes: will users/reviewers accept any kind of read for an exome or will they baulk at seeing something different from the paired-end norm?

PS: Your comments on this would be greatly appreciated!

Tuesday, 30 September 2014

Blogger's spell checker tries to fix "blog"

Why does blogger's spell check try to correct 'blog', and why does it suggest 'bog' or 'blag'. Are they telling me I'm writing s**t or trying to get something for free?

Please fix this one

Monday, 29 September 2014

Thanks for reading

This morning someone made the 500,000th page view on the CoreGenomics blog. It amazes me that so many people are reading this and the last couple of years writing have been really good fun. I've met many readers and some fellow bloggers, and received lots of feedback in the way of comments on posts, as well as at meetings. I've even had people recognise my name because of my blogging; surreal! But the last few years have seen some big changes in how we all use social media like blogs, Twitter, etc. I don't think there is a K-index for scientific bloggers, perhaps Neil can look at that one next ;-)

Question: What do you see?

Sunday, 28 September 2014

Making BaseSpace Apps in Bangalore

I'm speaking at the BaseSpace Apps developers conference in Bangalore tomorrow. It's my first App and my first time in India, so I'm pretty excited about the whole thing.

Tuesday, 23 September 2014

Welcome to a new company built around ctDNA analysis: Inivata

Inivata, is a new company spun out of Nitzan Rosenfelds research group at the CRUK Cambridge Institute (where I work). His group developed and published the TAm-seq method for circulating tumour DNA amplicon sequencing. The spin-out aims to develop blood tests measuring circulating tumour DNA (ctDNA) for use as a "liquid biopsy" in cancer treatment. Inivata has been funded  by Cancer Research Uk's technology arm CRT, Imperial Innovation, Cambridge Innovation Capital and Johnson & Johnson Development Corporation; initial funding has raised £4million.

Inivata is currently based in the Cambridge Institute and the start-up team include the developers of the TAm-seq method: Nitzan Rosenfeld (CRUK-CI), Tim Forshew (now at UCL Cancer Institute), James Brenton (CRUK-CI) and Davina Gale (CRUK-CI).

The research community has really taken hold of cell-free DNA and developed methods that are surpassing expectations. Cell-free DNA is having its largest impact outside of cancer in the pre-natal diagnostics market. And has been shown to be useful in many types of cancer. The use of ctDNA to follow tumour evolution was one of the best examples of what's possible I've seen so far and it's been exciting to be involved in some of this work. Inivata are poised to capitalise on the experience of the founding team and I'll certainly be following how they get on over the next couple of years.

If you fancy working in this field then they are currently hiring: molecular biologist, and computational biologist posts.

This is likely to become a crowded market as people pick up the tools available and deploy them in different settings. ctDNA is floating around in blood plasma and is ripe for analysis, I expect there is still lots of development space for new methods and ultimately I hope we'll be able to use ctDNA as a screening tool for early detection of cancer.

If we can enrich for mutant alleles using technologies like Boreal or Ice-Cold PCR then detection (not quantitation) may be possible far earlier than can be achieved today.

Monday, 15 September 2014

Are PCR-free exomes the answer

I'm continuing my exome posts with a quick observation. There have been several talks recently that I've seen where people present genome and exome data and highlight the drop-out of genomic regions primarily due to PCR amplification and hybridisation artefacts. They make a compelling case for avoiding PCR when possible, and for sequencing a genome to get the very best quality exome.

A flaw with this is that we often want to sequence an exome not simply to reduce the costs of sequencing, but more importantly to increase the coverage to a level that would not be economical for a genome, even on an X Ten! For studies of heterogeneous cancer we may want to sequence the exome to 100x or even 1000x coverage to look for rare mutant alleles. Unfortunately this is exactly the kind of analysis that might be messed up by those same PCR artefact's, namely PCR duplication (introducing allele bias) and base misincorporation (introducing artifactual variants).

PCR free exomes: In my lab we are running Illumina's rapid exomes so PCR is a requirement to complete the Nextera library prep. But if we were to use another method then in theory PCR-free exomes would be possible. Even if we stick to Nextera (or Agilent QXT) then we could aim for very low-cycle PCR libraries. The amount of exome library we are getting is huge, often 100's of nanomoles, when we only need picomoles for sequencing.

Something we might try testing is a PCR-free or PCR-lite (pardon the American spelling) exome to see if we can reduce exome artefacts and improve variant calling. If anyone else is doing this please let me know how you are getting along and how far we can push this.