The official blog of University of Missouri Skeptics, Atheists, Secular Humanists, & Agnostics

DIY Science: Build Your Own Molecular Phylogeny

Kindly like our page on facebook.

Greetings, friends. My name is James. Today, I’m going to turn everybody who reads this blog into an evolutionary biologist in a few easy steps. How will I do that? Simple: I’m going to teach you how to make your very own evolutionary tree using DNA sequences. I’ll start by answering some basic questions.

“What is an evolutionary tree?”

An evolutionary tree, or phylogeny, is a graphical representation of the relatedness of organisms. A phylogeny is made up of bifurcating (spitting into two) branches joined together by nodes. They can be built using physical characteristics (morphology) or molecular sequences (DNA, RNA, or protein). These trees can compare different species, different individuals or populations within a single species, or even different genes within the same organism.

“Okay, but what does it mean?”

The layout of branches on a phylogeny shows how closely two organisms are related. Two organisms joined together at a single node are more closely related than two organisms separated by several nodes. A chunk of the tree containing a node and all the branches that arise from it is called a clade.

“I still don’t understand. I’m scared.”

That’s fine; I basically summarized an entire year’s worth of biology coursework into two paragraphs. There are better, more detailed descriptions of phylogenies at your local library.

“Do I still have to make my own tree?”

Yes. Next question.

“What if I don’t believe in evolution?”

Prepare to be surprised.

Making your tree

Step 1: Pick your organisms

Make a list of 10 of your favorite critters. They can be animals, plants, fungi, or protozoa (bacteria will work, but I’d advise against it on your first tree). The list can even be a mix of any of the above.

Step 2: Pick your gene

Any gene will work in theory, but some are better than others. The best are so-called “bar-coding” genes, which are relatively conserved between species. The only important thing is that all of the species you want to analyze have the gene. I recommend the COX1 gene for this experiment, but feel free to find others.

Step 3: Find your sequences

Go to the National Center for Biotechnology’s website and do a gene search for your organism and gene. For example, if you’re looking for the COX1 gene from a tiger, search for “tiger COX1.” It helps if you search for the scientific species name (Panthera tigris COX1). These are easily locatable on Wikipedia. The NCBI database is very large but doesn’t have sequences for every organisms (yet), so if you can’t find your species, just search for something similar. Click on the link.

Step 4: Copy the sequence

Go to the middle of the gene page and find a sentence that says “Go to nucleotide.” To the right, you’ll see the words Graphics, FASTA, and GenBank. Click on FASTA.

Highlight the sequence starting at the “>” character and copypasta to a word processer (Notepad or Text Wrangler work fine). Make sure you have something like this:

>gi|187250362:6280-7824 Panthera tigris mitochondrion, complete genome



Repeat for all 10 of your species.

Step 5: Science!

This is where things get interesting. There are a mind-boggling number of options for making your tree. Basically, you need to line up your sequences, chop out the nonsensical bits, build the tree, and render it. These processes are collectively known as bioinformatics. Bioinformatics is a hellish Pandora’s Box of suffering, so if you value your sanity, you’ll just have to trust me on this step.

By far the simplest method I’ve found uses the website, which is giant server that creates a software pipeline. All you need to do is copy and paste your list of sequences into the “one-click” box. The server does the rest.

Step 6: Magic

Wait for the website to finish. If you get an error message, make sure your sequences are formatted right. They should all start with a “>” and not contain any funky Unicode characters. Check out the example sequence on the website if you still have issues.

Step 7: You’re Done!

Behold the wonders of evolution! Does the tree make sense? Try again with more species, longer genes, and different setting. Now you’re cooking… with Science!

James is a graduate of the University of Missouri, Columbia. He is a research biologist specializing in the molecular evolution of invertebrates. If you would like to pay James to do science for you or your laboratory, please post in the comments. Also, feel free to visit his subpar research blog.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


This entry was posted on August 3, 2011 by in Author: James Pflug and tagged , , , .
%d bloggers like this: