Please, don’t use CO1 barcodes alone for spider phylogeny

We systematists seeking to understand phylogenetic relationships of spiders need all the data we can get, and as easily as possible. There are few of us doing this work, and we have so many species to consider, that we could use a lot of help. It’s therefore tempting to look at the growing hoard of data from CO1 barcoding as a ready solution. But don’t, please don’t, use CO1 alone to reconstruct spider phylogenetic relationships.

CO1 (“cytochrome oxidase one”) is a mitochondrial gene that was chosen as the standard gene in animals to provide a natural “barcode” for identifying species. I won’t comment on how well it fulfills that task, except to say that I think it provides useful data, and that sometimes it works to distinguish species, sometimes it doesn’t. I will comment instead on how well it can be adapted to a different task: figuring out how species are genealogically related, looking back in time along the evolutionary tree. In the jumping spiders I have studied, CO1 is frequently highly misleading.

I’m provoked to say this because CO1 came into my consciousness, coincidentally, from four different directions this last week. Two colleagues independently mentioned they are planning or had planned to focus on CO1 for spider phylogenetics projects. I learned of a published paper that uses database-mined CO1 data, alone, to reconstruct phylogeny of some spiders. Finally, I was pulling together a phylogenetic analysis and looked to Genbank for additional data — where I found many sequences of CO1 from barcoding efforts. I will use those, but only because I have enough other data (both other genes and morphology) to overwhelm CO1’s flaws and provide the primary signal of evolutionary history.

What’s the problem with CO1? It frequently yields phylogenetic trees that are so bonkers, crazy, goofy that it can’t be trusted to stand on its own. How do I know that? By comparison with all other genes and morphology. Other genes we’ve studied well, 16SND1 (also mitochondrial!), 28S, Actin 5C, and wingless, produce phylogenetic trees on their own that largely make sense morphologically and that largely agree with one another.

CO1, on the other hand, is psychedelic. You can see that in Fig. 26 of Maddison et al. 2014 (ZooKeys 440: 57–87) shown below: the scattered pale blue lineages are all euophryines, clearly, by morphology and all other genes. Yes, there are moments of sanity (the purple hasariines hold together), but then CO1 simply loses it. You might rightly criticize this as a very sparse taxon sample for such a big group (shown in that figure is the major clade of >5000 species), but we’ve seen it in denser samples, the other genes do not suffer so, and the clade is probably rather young (<50 million years). We cite previous results from 2003 and 2012 showing that “CO1 struggles through both shallow and deep levels”. With a much denser taxon sample in the tribe Euophryini, Junxia Zhang and I (2013, Molec. Phyl. & Evol. 68: 81–92) found the CO1 tree to have regions of sanity but then some wildly broken parts, not only with members of other tribes and subfamilies jumping scattered into the midst of euophryines (Plexippus, Aelurillus, Heliophanus, Neon, Philaeus, spartaeines), but with members of a single genus (clearly, by all other data and geography) split far apart in the tree (Popcornella, Thyenula).

Salticinae portion of tree from CO1 (from Maddison et al. 2014)

I could focus my criticism on the sole use of a single gene, being way behind the times that we are now in, with phylogenomics giving us 500-gene results. Yes, we should have more data than just from a single gene, but it’s a matter of acceptable errors. With CO1, all signs are that its errors go beyond acceptable. CO1 is peculiarly misleading. Its mitochondrial colleague 16SND1 is not so bad, and while I’d still hesitate to base conclusions solely on it, from my experience it would make only small mistakes, not huge ones. And, as difficult as it may be to interpret, morphology alone is more reliable than CO1, as its basis is distributed throughout the genome. If one has only a single gene, morphological support for the results should be sought to improve their credibility. If that gene is CO1, I’d want to see a lot of morphological support.

Oh, had the barcoders only chosen a different gene!

Edits: hordes to hoard; through to throughout.