|
|
UNIVERSITY OF BUCHAREST FACULTY OF PHYSICS Guest 2024-11-22 1:38 |
|
|
|
Conference: Bucharest University Faculty of Physics 2007 Meeting
Section: Electricity and Biophysics
Title: DNA as digital data
Authors: Radu Mutihac
Affiliation: University of Bucharest - Phisics Department
E-mail mutihac@astralnet.ro
Keywords: Information theory, DNA, gene, digital communication, code theory.
Abstract: Protocols of information transmission in molecular systems are revisited in the light of digital communication theory applied to molecular biology. The hypothesis that faithful communication of genetic information over geological time depends on error-correcting codes can be evoked to explain the evolutionary emergence of discrete species and taxonomical hierarchy, as well as the trend towards increased complexity of organisms during evolution.
By expressing nucleotide bases as four-digit binary numbers like quaternary symbols, nucleic acid replication can be formulated in terms of coding theory and explains the selection of A, C, G, and T as the optimal alphabet for encoding genetic information. Potential coding properties in genomic sequences is explained by detecting linear dependencies and repetitive structures in DNA. Large domains of similarly expressed genes are identified by employing a minimum description length (MDL) strategy.
The existence of error control mechanisms in genetic processes is advocated by encoding the exons of a gene using a mathematical coding strategy that transforms the exons into binary parity strings. The encoded sequence is analyzed for dependency structures; if present, they support the hypothesis of deterministic error-control within genetic sequences. A Bayesian classifier is built up by modeling the messenger RNA as a noisy encoded sequence and the ribosome as an error-control detector to distinguish between valid and invalid ribosome binding sites.
Recent advances in biological information and coding theory shed a new light on diseases, like various forms of cancer, AIDS, and geriatric maladies, which might be quantified in terms of failures in the genetic error-control system. Consequently, a major benefit of the intersection of Shannon’s 1948 information theory and Watson and Crick’s 1953 discovery of the DNA double helix potentially consists in a quantitative framework for designing fault-tolerant genes, proteins, and genomes that approach the communication capacity of a living organism.
|
|
|
|