Making sense of a huge conglomeration of base pairs
The genome sequence of the moss Physcomitrella patens was deciphered around two years ago, making research involving the biotechnologically interesting plant become easier and quicker. But has all the work already been done? Prof. Dr. Ralf Reski and his team at the University of Freiburg have made considerable contributions to the analysis of the moss genome. However, the researchers are aware that the base pair sequence still has many gaps and errors. And the "text" encoded by the base pairs – apart from a few segments – is still not largely understood. But how do the Freiburg researchers manage to navigate around the chaos of millions of letters (base pairs) and sentence fragments (DNA fragments)?
Why is the genome sequence of Physcomitrella patens so important? The inconspicuous plant belongs to the mosses, a group of plants that moved from water to land about 450 million years ago. It is thus perfect for studying the evolution of modern land plants. Its cells still use many of the old metabolic pathways which modern seed plants have lost. For example, mosses produce long-chain unsaturated fatty acids such as arachidonic acid, which are important for human health. In addition, mosses can survive long periods of drought or extremely high salt concentrations in the soil. "Once we are able to understand these mechanisms on the molecular level, then we will be able to incorporate them into seed plants commonly used in agriculture," said Prof. Dr. Ralf Reski of the Department of Plant Biotechnology at the Institute of Biology II at the University of Freiburg.
Bioinformatics and in-depth understanding of the organism are a prerequisite
Besides application-oriented aspects, Reski’s model organism is also compelling due to its ability to accurately incorporate foreign DNA into its genome. This phenomenon, known as homologous recombination, makes genetic manipulations in the laboratory particularly safe and efficient. The biotechnologist and his research colleagues began harnessing this principle by using the plant to produce antibodies that are relevant for humans or pharmaceutical proteins. Nowadays, production of proteins in moss, which is also referred to as molecular farming, is carried out on an industrial scale by the company greenovation Biotech GmbH, which was established by Reski and the cell biology professor Gunther Neuhaus in 1999.
“Knowing that the moss Physcomitrella patens has all these useful properties, we decided to start a genome project in 2004,” said Reski. The group of researchers was supported in their efforts by American, British and Japanese colleagues, as well as by the German Research Foundation (DFG). The Joint Genome Institute (JGI), based in California, provided the sequences of the Physcomitrella genome and the Freiburg researchers contributed above all their biological and bioinformatic know-how. Such know-how is indispensable for deciphering a plant or animal genome, as the pure base pair sequence is no more than an agglomeration of letters (editor's note: G, A, T, C – which stand for the nucleotides that make up DNA - guanine, adenine, thymine, cytosine) which can only be assembled into a meaningful overall picture with specialist background knowledge of an organism and specially designed computer algorithms.
The genome as a puzzle
“Although the sequence of the moss genome became available in 2007, in principle we are still in the initial phase of the genome project,” said Dr. Daniel Lang, one of the bioinformaticians in Reski’s group. The sequence was deciphered with what is known as the shotgun method, in which ultrasound is used to shredder the DNA randomly into pieces around one thousand base pairs long, of which the sequences are subsequently determined. With a genome of around 500 mega base pairs and the need to resequence the genome eight times for statistical reasons, such a procedure leads to a pure “base pair salad”. The bioinformaticians have since been working on finding overlapping sequences in the millions of sequence fragments, as such overlapping regions suggest with high probability that certain fragments were originally located at adjacent positions in the genome. Using computers, the researchers combine the fragments to sequences of increasing length – arduous and time-consuming work. But over time, the researchers are successfully piecing together the genome puzzle, and before very long only a few gaps will remain.
"We have now reduced the original number of several million small fragments to about two thousand longer sequences," explains Lang. "In the best case, we hope to be able to close the gaps between these sequences, eventually resulting in 27 long sequences, which corresponds to the number of chromosomes in Physcomitrella." In the process of assembling 27 sequences, the researchers make use of information about the relative distances between genes, obtained by conducting classical crossing experiments between different wild strains. Such experiments provide the researchers with a rough genetic map which is then used by the bioinformaticians to arrange their sequences.
Unusual basic research?
But what can this work achieve? Is there more to it than an endless chain of letters for which nobody knows the rules of assembly of the “words”? Or put differently: where in this chaos are the genes and what is their function? A search for similar sequences in other known genomes (e.g., Arabidopsis) can only partially help. The moss and Arabidopsis are distant evolutionary relatives, and hence have genomes that differ considerably from each other. Another possibility for the researchers is to retranscribe mRNAs (precursors of proteins) into DNA. A comparison with the genome sequence might help the researchers identify the genomic areas where the genes for these proteins are located. That is exactly what Reski and his team are currently working on in cooperation with Japanese colleagues. “In order to get the full picture, it is important for the international moss community to work closely together,” said Lang. “Researchers investigating certain moss genes are in constant contact with us, which enables us to optimise our databases and eliminate potential errors or gaps.”
Despite the fact that the sequences are not yet complete, they can still be used to facilitate the researchers' work. Reski's team has already been able to discover new genes that enable the moss to survive long periods of drought. In addition, they also succeeded in identifying the fifth gene of a gene family that is involved in the proliferation of the energy production sites (chloroplasts) of moss cells. "The work on the genome sequence of Physcomitrella is a permanent construction site, but it has already catapulted us into a completely different league. When I started to work with the moss a few years ago, some colleagues called my research "unusual basic research". However, the plant has now become an important model organism that is also of great interest for the biotechnology and pharmaceutical industries.
Professor Dr. Ralf Reski
Department of Plant Biotechnology
Faculty of Biology
University of Freiburg