I had a wonderful time at the Functional Annotation of Animal Genomes workshop last week held at the National Academy of Sciences.  I am very grateful to the RCN for the opportunity to attend, and certainly learned a lot about the F.A.A.N.G. community goals as well as some of the challenges they are facing in their consortium (and had a blast with Carl and Fiona of course).  I’ll do my best to summarize some of the main highlights of the meeting, but because of unavoidable personal preferences, my ‘highlights’ might not be as impartial (or brief) as one would hope.
The three plenaries were very insightful, and evoked much discussion later on in the breakout session.  John Stamatoyannopoulos (UW) spoke about the regulatory genome, and how the human genome community is getting very close to a complete lexicon for the recognition transcription factors.  He ended the plenary by setting up by the next 5 years in the field, and spoke about the transition from discovery to detection in the clinical setting, and that by 2018 he predicts about 1/2 of the regulatory DNA will be mapped to the genes being regulated.  He also answered questions from the group regarding SNP location, and how important it will be to identify the SNPs in the regulatory regions, and not just protein coding.  He urged the F.A.A.N.G. community to gather as much data as possible, on as many different cell types and tissues, even if analysis is lagging behind, in order to have the depth needed to understand the regulatory genome across animals.
Christine Wells (Univ. of Glasgow) spoke about functional annotation of mammalian genomes (mainly highlighting Atlas projects).  She organized her talk with three overall bullets:
  1. EMERGENCE: gaining insight from omics data
  2. MODULARITY: finding networks and pathways and
  3. ROBUSTNESS: understanding network properties.
She made the main point that none of these can really be answered with just one species and urged the community to think about
  1. generating quality data (metadata standards)
  2. tools for visualizing (she spoke about how this is one of the biggest challenges but did not offer solutions — obviously this is still a hard problem in the community)
  3. collaboration with bioinformatics groups (and how that collaboration is organized) is imperative.
  4. She brought up the fact that the community needs a new ontology for RNA and that noncoding/coding is just not cutting it.
Paul Flicek (EMBL-EBI) spoke a lot about the bioinformatic challenges the community faces (infrastructure beyond hardware and software) and the role EBI has been playing in the 1000 genomes project.  He outlined four stages of how a project ‘might go’, which made me giggle quite a bit — because he’s dead on.
  1. It’s so easy, but wait, how much is it going to cost?
  2. Time generation: way more time than you thought will go into the nitty gritty (quality, assembly, downstream analysis)
  3. It works! (mostly) — some agreement within the members of a given community on the ‘right’ way to do it.
  4. “See, that wasn’t so hard.”
But unrealistic time goals are made for the next phase because the technology changes, and the cycle repeats itself.  He really drove home the point that in this field, you really need to make things (i.e. analysis pipelines) as good as possible, but to not wait until they are perfect.
Funding agencies gave short talks (NSF, Canadian Genome Enterprise, USDA-NIFA, BBSRC, and European Commission).  Below are some highlights that I think matter most to our group:
Rob Miller (NSF) spoke about the directorate-wide initiatives 1. Understanding the Rules of Life (URL) and 2. Genome to Phenome (G2P) and mentioned EDGE (enabling discovery through genomics tools) as a resource to make the link between the transcriptome and phoneme, but I was unclear regarding the use in non-model organisms at this time.  He of course mentioned the G2P RCN.  Following Christine Wells’ plenary, Carl started an evocative discussion about the level to which we as a community are educating those that might sit on the review panels.  Jeremy Taylor from the Univ. of Missouri also brought up conflicts with reviewing USDA grants because of his network of associations which led to  Steve Ellis speaking about the NSF’s experimental review process whereby each submitter is required to review 7 other proposals, and some of the scoring is based on the quality of the reviews given for other proposals.  Parag Chitnis (USDA-NIFA) gave some numbers about NIFA’s overall extramural funding budget ($1.5 billion) with the aquaculture programs seeing about 200 – 300K of that.  Gulp.
The breakout sessions were a little lopsided regarding productivity (i.e. community agreement).  The breakout session dealing with sample collection standards (and how many cell/tissue types are needed for the F.A.A.N.G. goals), assay protocols, and sample storage did not come to a crisp agreement, but it did appear that a general direction was achieved.  I attended the data analysis breakout session (as did Carl and Fiona) and the group seemed to make several agreements.  The analysis group decided it would agree to hard pipelines for analysis, with periodic data freezes planned when a new pipeline needed to be rolled out; previous data will be re-analyzed, and a new hard pipeline put in place.  There was some stimulating conversation regarding how there is little reward for being a part of the F.A.A.N.G. initiative, and writing a community paper on the pipeline was suggested.  Student/post-doc ‘swapping’ for short bits of time in order to gain better analysis experience was suggested (something I think our RCN could really do).  Laura Clarke (EMBL) also gave a talk earlier that day driving home the importance of metadata standards and sharing scripts via Github for the F.A.A.N.G. community.
Carl was a champion for the decapods at this meeting, and because of that at several points various folks used ‘decapods’ as their example of where the community needs to go to expand the idea of ‘animal’ genomes past livestock.  Although, I will say that I got several questions regarding what a decapod was, and whether they were insects (insert a chuckle here).  He gave some slides on the RCN and spoke about the proposed meeting for next summer.
Finally, Jim Reecy (Iowa State), Jiuzhou Song (U. of Maryland), and Laura Clark (EMBL) summarized ‘the way forward’ for F.A.A.N.G. and specifically spoke about needing clear experimental treatments to understand function (which I think is a clear advantage that the decapod community has, even if we don’t have genomes!).  And finally, they cautioned the community about using students to generate the F.A.A.N.G. data, and that this is a better role for technicians.  Students should be carving out other projects within the F.A.A.N.G. datasets, but should not be in the pipeline per se.
I think the F.A.A.N.G. community has a lot to offer our RCN in several areas including guidelines for metadata in order to move towards cross-species/lab comparisons  and possible data-sharing mechanisms.