Main Menu    Japanese
Related Links
Contact information:

Email: ibalabwww@ibalab, iba@ibalab
ibalab=iba.k.u-tokyo.ac.jp


Telephone Numbers:
7H1:+81-4-7136-3873
7H4:+81-4-7136-3874
7E4:+81-4-7136-3875

Bioinformatics Supplement
(detailed experimental results of some papers)


Some Problems of Computational Biology and Bioinformatics

1. Selection of informative genes from Microarray data for classification of diseases:
Given the gene expression values of different types of diseases, the objective of this research is to select those genes that are responsible for the distinction of different patient samples.


2. Inference of gene regulatory network:
The objective is to predict a regulating network structure of the interacting genes from observed outcome, i.e., expression pattern. The task consists of modeling the rules of regulation and inferring the network structure from observed data.


3. Identification of coding regions in DNA sequences:
DNA sequences commonly contain both coding (exons) and non-coding region (introns and intragenic spacer DNA). Within genes, exons are defined as being coding material for protein products, whereas introns are generally associated with no particular purpose other than to separate exons and are spliced out of mRNA prior to translation into protein.Given a DNA sequence,the objective of this research is to classify this sequence as either a coding or non-coding sequence.


4. Multiple sequence alignment:
Given a set of homologus sequences, multiple sequence alignments are used to predict the secondary or tertiary structure of new sequences, to demonstrate homology between new sequences and new families, to find diagnostic patterns for families and to reconstruct phylogenetic tree. Example: consider two sequences, one newly discovered and the other previously stored in a database:

  ATCTCTGGCA
   TACTCGGCA
An alignment of these two sequences could be as follows:
  ATCTCT-GGCA
  -T-ACTCGGCA


5. Determination of genome sequences from experimental data:
The objective of this research is to reconstruct an original DNA sequence of length n on the basis of a spectrum which consists of words of length l over the nucleotides and may contain positive and negative errors.
Example: Suppose the original sequence is ACTCTGG and its ideal spectrum of length 3 is {ACT, CTC, TCT, CTG,TGG}. Due to positive and negative errors, suppose the spectrum is {ACT, CAA, CTG, TCT,TGG, TTG}. From this spectrum, we have to search some 3-letters elements such that their ordering will produce a DNA sequence of length 7. Two orderings of the oligonucleotides {ACT,TCT,CTG,TGG} and {CAA, ACT, CTG,TGG} from the spectrum are possible. The first one produce the original DNA sequence, the second one is erroneous.


6.3-D Visualization of a Gene Regulatory Network:
The objective of this research is to identify and show the collaborating genes in the network on different layers, which will facilitate the behavioral study of the groups and the network as a whole.


Suggested book for those who are novice in evolutionary computation and Bioinformatics:
Gary B. Fogel and David W. Corne, "Evolutionary Computation in Bioinformatics", Morgan Kaufmann Publishers, 2003.

Bioinformatics/Genome Informatics/Computational Biology

Q. What is Bioinformatics?
Bioinformatics is an interdisciplinary field bringing together biology, computer science, mathematics, statistics, and information theory to analyze biological data for interpretation and prediction. This field is also known as Computational Biology and Genome Informatics.

Q. What is the central dogma of molecular biology?
The central dogma of molecular biology states that information is contained in DNA, then it is transcribed into mRNA and finally translated into protein.

Q. What is the structure of DNA?
DNA (deoxyribo nucleic acid) consists of two long strands running in an antiparallel orientation to form a double helix. The two strands are complementary of each other and each strand consists of phosphate, deoxyribose sugar and nucleotides (adenine[A], guanine[G], cytosine[C], and thymine [T]). The nucleotides in two strands follow the base pairing rules: A pairs with T and C pairs with G.

Q. How is protein made from a DNA sequence?

  1. First the DNA sequence is transcribed into an mRNA sequence using the nucleotides A,G,C and U (uracil which is used in place of thymine).
  2. Then this RNA sequence is exported out of the nucleus (in eukaryote only) to cytoplasm for translation into a protein primary sequence.
  3. Then the RNA sequence is deciphered as a series of three-letter codons where each codon correspond to an amino acid.
Example:
1. DNA sequence:AGTCTCGTTACTTCTTCAAAT
2. mRNA sequence:AGUCUCGUUACUUCUUCAAAU
3. 3-letter codons:
   AGU CUC GUU ACU UCU UCA AAU
4. Amino acid sequence: SLVTFLN   
where S=Serine (AGC/AGU), L=Leucine (CUA/CUC/CUG/CUU/UCA/UUA/UCG/UUG), V=Valine (GUA/GUC/GUG/GUU), T=Threonine (ACA/ACC/ACG/ACU), F=Phenylalanine (UCC/UUC/UCU/UUU), N=Asparagine (AAC/AAU).

Q. What are start and stop codons?
A start codon signals the machinery of the cell the beginning of the translation of mRNA sequences into amino acids. There is only one start codon which is ATG ( or AUG in mRNA [methionine]).

Any one of the three stop codons: TGA, TAG and TAA tells the cell machinery to stop translating codes into amino acids.

Q. What is gene expression level?
Gene expression is the process by which messenger RNA (mRNA) and eventually protein is made from the DNA template of each gene. The portion of each gene that is represented as mRNA is known as coding sequence for that gene. Since mRNA is an exact copy of DNA coding regions, genomic analysis at the mRNA level is used as a measure of gene expression.

Gene expression level indicates the amount of mRNA produced in a cell during protein synthesis; and is thought to be correlated with the amount of corresponding protein made.

Q. What is the hypothesis behind gene expression based classification of patient samples?
The hypothesis behind gene expression based classification is that expression levels are affected by a large number of environmental factors, including temperature, stress, light, and other signals, that lead to change in the level of hormones and other signaling substances, and many or all human diseases may be accompanied by specific changes in gene expression levels.


Get Java source code of some Bioinformatics problems.

Read online biomedical books at http://www.ncbi.nlm.nih.gov/