Viruses affect every environment including human health and disease. Viruses can devastate agriculture production [1,2], and impact aquatic and terrestrial environments [3–7]. They are implicated in many human diseases including Irritable Bowel Diseases [8,9], and complications associated with Cystic Fibrosis [10,11], and have shown promise for combating the rise of antimicrobial resistance [12,13]. Understanding viruses as core components of microbiomes is crucial to combating many diseases and environmental issues.
Analysis of viral diversity using modern sequencing technologies presents several unique challenges. Viruses are underrepresented in reference databases despite being the most diverse and abundant organisms. This vast viral dark matter creates a serious hurdle. Search strategies need to be sensitive enough to identify novel viruses that may only be distantly-related to a known virus. Virus detection requires a non-targeted, random (shotgun) approach, resulting in genetic material from non-viral sources being sequenced along with the viruses. Non-viral contamination can comprise most of the genetic material, even in virus-enriched samples. Viruses can share sequence homology with other domains of life, leading to false-positive identification and incorrect conclusions in many viral metagenomics studies. Search strategies must therefore also be highly specific to avoid false-positives.
We introduce Hecatomb, a bioinformatics platform for viral metagenomics. Hecatomb enables both read- and contig-based analysis, for either short or long read sequencing technologies, and integrates query information from both protein and nucleotide databases. Hecatomb prioritises integration of data collected throughout the workflow as well as with external viral data sources, creating a rich, high-dimensional dataset which empowers researchers to evaluate their results quickly and painlessly. We apply Hecatomb to a previously-studied dataset of gut viral metagenome samples from a cohort of AIDS-infected Rhesus Macaques. We show how Hecatomb expedites filtering and statistical interrogation, and how it provides a far more complete picture of the viral component of microbiomes than was previously possible. Hecatomb is available on GitHub (github.com/shandley/hecatomb).