December 8th, 2011

Pfish

Advent Science: Day Eight

Your Genome And You

Your genome contains approximately 3.2 billion (US billion) base pairs of DNA. If it could be laid out end to end, it would be about two metres long, and 2.4nanometres wide.

We still don't know how many genes there are. Before the Human Genome Project, it was estimated at around 100-150000, based on numbers estimated for smaller and less complex organisms. Now we have an approximately complete human genome sequence, the estimates are getting much smaller, with the first suggestions after the Genome Project falling between 30 and 40000. We currently know of around 20000 protein coding sequences in the genome, with another couple of thousand sequences that look like protein coding sequences but that haven't been confirmed yet...

...wait. Look like a protein coding sequence? Well, these are sequences that start with a start codon, end with a stop codon, and have a decent length of coding DNA in between (bits of the genome that do not code for proteins tend to have stop codons all over the place). However, they are sequences for which we have yet to find the corresponding protein in the human body, or evidence that the gene is actually expressed anywhere. There are also some genes that are never translated into proteins, for instance the messenger and transfer RNA molecules mentioned on Day Five.

It's thought that somewhere between 1.5 and 2% of the human genome actually codes for proteins. So, barring the tiny fraction that as mentioned above codes for functional RNA... what is all the rest?

The answer to a lot of it is "we don't know". It's commonly referred to as "Junk DNA" although many scientists are hesitant to use such a term until we're absolutely certain it doesn't do anything. Especially because some of it *does*.

Some non-coding DNA *actually* doesn't do anything. The genome contains various nonsense repeats, random bits which are often thought to be the remnants of viruses that have incorporated into the host DNA and been passed on, and pseudogenes- things that used to be genes, but have mutated away from a functional form, or have been accidentally duplicated without the surrounding sequences needed to be transcribed. It is some of these sequences which are used for DNA fingerprinting in criminal and paternity cases- the lengths of some known pieces of "junk" DNA varies so much between individuals that take enough of them and you have a unique genetic "fingerprintt".

However, another major sort of non-coding DNA is regulatory sequences- bits of DNA which can regulate the expression of the genes they surround. This is clearly non-coding DNA with a function, Which leads us neatly onto tomorrow's question: How does the body know when and where to express a gene?