Welcome to the Kokocinski-net
 
 
 
 
 

LINKS

[organizations etc]
O bioinformatics.org
    an intern. initiative
O bioinformatik.de
    a German initiative
O NCBI
    The US-National Center for
    Biotechnology Information
O EBI
    The European Bioinformatics
    Institute
     services & ressources
[bioinformatics journals]
O Bioinformatics
O In Silico Biology
O Computational Biology
    and Chemistry

O Nucleid Acid Res.
     Database Issue

O Genome Biology(partly)
O Genome Research(partly)

[programming languages]
O development of the various
    programming languages

    [german]
O PERL
   the practical data extraction
   & reporting language, useful
   for everyting
    - perl.com
    - bioPerl, extensions for
      sequence handling etc.
    - CPAN, perl ressources
    - bioPerl tutorial
O JAVA
   the nice-to-code & strictly
   object-oriented networking
   language
    -a manual from SUN
O C/C++
   still one of the most powerful
   languages for larger projects
    - Microsoft´s pages for
      Visual C++

O PHP
    a nice web-scripting lang.
    - the official manual or in German
    - more resources
O HTML & Co.
    Webpage essentials
    - SELFHTML-Tutorial: HTML,
      Javascript, CCS


[useful resources]
O mySQL database
O phpAdmin
   (mySQL administration)
O apache webserver
O apachefriends: pre-packed
   xampp/wampp (apache,
   mysql,perl,php...) - your own
   server, ready for installation.
[links to education]
O Linux
     - Das wahre Linux An-
    wenderhandbuch (german)

     - General Linux Infos
O Statistics
     - in depth statistics from
    the makers of matlab

O more tutorials
     - web tutorials
   @ w3schools.com

 
 
 
 
 
 
 
 
                        introduction and overview
[in short] "Bioinformatics - the development and application of computational methods to acquire, store, organize, archive and visualize biological data - is one of the fastest-growing technologies." [1]
[definition] "Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned" [2]
[definition] "Bioinformatics is the application of computer technology to the management and analysis of biological data. The result is that computers are being used to gather, store, analyse and merge biological data." [8]
[definition] "Bioinformatics is conceptualising biology in terms of molecules and applying informatics techniques (derived from disciplines such as applied maths, computer science and statistics) to understand and organise the information associated with these molecules, on a large scale. In short, bioinformatics is a management information system for molecular biology and has many practical applications." [3]
   
[some typical applications]
  • Data management with databases
    How can the large amounts of data be handled, that e.g. hight-throughput experiments or the sequencing of the human genome produce?
    more on this topic
     
  • Sequence analysis
    One of the oldest areas where bioinformatics developed into a science of its own was the task to analyze the information encoded in our own genes.
     
  • Data analysis
    All experiments need (and should get) computational support at some point. From a few calculations in an excel file to trained classifying methods used in the field of data mining, there are various ways to get a better understanding of the experimetal data.
     
  • Pathway reconstruction
    The analysis of various types of experiments hopefully leads to insights into the underlying biological mechanisms. The discovery of how the individiual parts (molecules or other entities) act together in a coordinated network is one of the most exciting areas of research.
[further readings & sources]
   
 
 

 
  ENSEMBL
the leading genome annotation and browsing system
www.ensembl.org
 
 
EnsEMBL is a joint project between the Wellcome Trust Sanger Centre & the European Bioinformatic Institute.
The main goal is the automated annotation of genomes (eukaryotic and model organisms).
Some information is imported from other Ressouorces, but most data on the location of genes, transcripts and many
other features within the genome is gained by the analysis of sequence data with standard and with novel programs.
All data and all software is made avaliable for free.
 
The EnsEMBL project not only provides data on genome annotation, it also maintains a powerful interface
to work with this data programatically (API). The large set of modules written in Perl can for example be used to
  • fetch information on your favorite genomic region
  • get data on specific genes in different organisms
  • annotate the clones on your microarray
There is an API in the language JAVA as well, this is not being developed further, though.
More information can be found on the EnsEMBL pages. For specific coding questions subscribe to the mailing list.

Code examples to get you started

         
1.  connect to the database [\u2212]
#connect explicitely using the DBAdaptor
# (used in examples 1-8)

use Bio::EnsEMBL::DBSQL::DBAdaptor;
my $db = new Bio::EnsEMBL::DBSQL::DBAdaptor(
		    -host   => 'ensembldb.ensembl.org',
                    -dbname => 'homo_sapiens_core_42_36b',
                    -user   => 'anonymous',
					    );

#OR connect automaticall using a the Registry
# (used in example 9)
#please see 
  http://www.ensembl.org/info/software/registry/index.html

use Bio::EnsEMBL::Compara::DBSQL::DBAdaptor;
use Bio::EnsEMBL::Registry;
Bio::EnsEMBL::Registry->load_registry_from_db(
		   -host    => 'ensembldb.ensembl.org',
		   -user    => 'anonymous'
		   );

2.  fetch a specific chromosome [\u2212]
# [connect (1)]

my $chrom  = "X";
my $slice_adaptor = $db->get_SliceAdaptor;
my $slice = $slice_adaptor->fetch_by_region("chromosome", $chrom);
print "\nhave slice of ".$slice->seq_region_name." ".
      $slice->seq_region_start."-".$slice->seq_region_end;

3.  fetch a specific genomic region [\u2212]
# [connect (1)]

my $chrom  = "X";
my $start  = 100000;
my $end    = 200000;
my $strand = 1;
my $slice_adaptor = $db->get_SliceAdaptor;
my $slice = $slice_adaptor->fetch_by_region(
				"chromosome",
				$chrom,
				$start,
				$end,
				$strand);
print "\nhave slice of ".$slice->seq_region_name." ".
    $slice->seq_region_start."-".$slice->seq_region_end;

4.  fetch all chromosomes [\u2212]
# [connect (1)]

my @chromosomes;
foreach my $chr ( @{ $slice_adaptor->fetch_all('chromosome') } ) {
   #print out information
   print $chr->seq_region_name.", ".$chr->start." - ".$chr->end."\n";
   #store the names
   push @chromosomes, $chr->seq_region_name;
   #or work with the chromosome...
}

5.  fetch genes [\u2212]
# [connect (1)]

#all genes from a slice, e.g. a chromosome
# [get_slice]
my @genes = @{ $slice->get_all_Genes() };

#specific gene, using EnsEMBL-ID
my $gene_adaptor = $db->get_GeneAdaptor;
my $gene = $gene_adaptor->fetch_by_stable_id("ENSG00000147892");

#specific gene, using gene symbol (short name)
my $gene_adaptor = $db->get_GeneAdaptor;
my @genes = @{$gene_adaptor->fetch_all_by_external_name("ADAMTSL1")};

6.  get more information on the gene [\u2212]
# [connect (1)]

# [get_gene (5)]

#print basic information
print "genomic localization:\t".$gene->seq_region_name.
      "\t".$gene->start."\t".$gene->end.
      "\t".$gene->strand."\n";
print "description:\t".$gene->description."\n";
#print further infos, ids
foreach my $dbEntry ( @{ $gene->get_all_DBEntries } ) {
   if( $dbEntry->database eq "HUGO" or
       $dbEntry->database eq ""){
      $symbol = "symbol:\t".$dbEntry->display_id."\n";
   }
   else{
      print "ID:\t".$dbEntry->database.":\t".
      	    $dbEntry->display_id."\n";
   }
}

7.  work with the gene [\u2212]
# [connect (1)]

# [get_gene (5)]

foreach my $transcript ( @{ $gene->get_all_Transcripts } ) {
   print $transcript->dbID."\t".$transcript->start."\t".
   	 $transcript->end."\t".$transcript->strand."\n";
   foreach my $exon ( @{ $transcript->get_all_Exons } ) {
      print "\t".$exon->dbID."\t".$exon->start."\t".
      	    $exon->end."\n";
   }
}

8.  use SQL to fetch information [\u2212]
# [connect (1)]

my $query = "SELECT gene_id, seq_region_id, seq_region_start,
	  seq_region_end, seq_region_strand FROM gene LIMIT 5;";
my $sth = $db->dbc->prepare($query);
$sth->execute();
while(my ($id, $region, $start, $end, $strand) = $sth->fetchrow) {
   print "gene $id: $region, $start, $end, $strand\n";
}

9.  get homologues genes using EnsEMBL::Compara [\u2212]
use Bio::EnsEMBL::DBSQL::DBAdaptor;
use Bio::EnsEMBL::Compara::DBSQL::DBAdaptor;

#use Registry file for a simple connection setup,
#please see 
  http://www.ensembl.org/info/software/registry/index.html
use Bio::EnsEMBL::Registry;
Bio::EnsEMBL::Registry->load_registry_from_db(
		-host    => 'ensembldb.ensembl.org',
		-user    => 'anonymous'
		);

#get compara adaptors
my $ma =  Bio::EnsEMBL::Registry->get_adaptor(
			'compara', 'compara', 'Member')
   or die "\n$@\ncan't get adaptor 1.\n";
my $ha =  Bio::EnsEMBL::Registry->get_adaptor(
			'compara', 'compara', 'Homology')
   or die "\n$@\ncan't get adaptor 2.\n";

#fetch human gene from core database
my $query_species = "Homo_sapiens";
my $gene_id       = "ENSG00000147892";

#fetch source gene
my $member = $ma->fetch_by_source_stable_id(
		"ENSEMBLGENE", $gene_id) or return 0;
my $sourceGenome = $member->genome_db->dbID;
print "\nsource gene ($query_species): ".$member->stable_id;

#get all homologues from other species
my $other_species = "Mus_musculus";
my $homologies = $ha->fetch_by_Member_paired_species(
			$member, $other_species);

#or from all species
#my $homologies = $ha->fetch_by_Member($member);

#display all results
foreach my $homologie (@$homologies) {
  foreach my $member_attrib (@{$homologie->get_all_Member_Attribute}) {
    my ($newmember, $attrib) = @$member_attrib;

    if ($newmember->genome_db->dbID != $sourceGenome) {
      print "\nhomologue: ".$newmember->stable_id.
      	    " / ".$newmember->taxon_id.
            ": ".$newmember->chr_name.
            " ".$newmember->chr_start.
            "-".$newmember->chr_end;
    }
  }
}

10. get GeneOntology term for a gene using EnsEMBL & GOApph [\u2212]
use Bio::EnsEMBL::DBSQL::DBAdaptor;
# [connect (1)]

#use GO::AppHandle for GO logic if possible!
use GO::AppHandle;
my %args = (
      -dbhost => 'sin.lbl.gov',
      -dbname => 'go',
	);
my $apph = GO::AppHandle->connect( \%args );

# [get_gene (5)]

#get GO infos
if ( $gene->is_known ) {
  foreach $link ( @{ $gene->get_all_DBLinks } ) {
    if ( $link->database eq "GO" ) {

      #show GO term
      print $link->display_id;

      #get the ancester terms
      foreach my $go (@GOs1) {
	get_parent($go);
      }

    }
  }
}

#fetch all parent terms recursively
sub get_parent($) {
  my $term = shift;
  my $parent_term;
  my $type;
  my $parent_terms;

  $parent_terms = $apph->get_parent_terms($term);
  foreach $parent_term (@$parent_terms) {
    get_parent($parent_term);
  }
  $parent_term = $term->name();
  if ( ( $parent_term ne "Gene_Ontology" )
        && ( $parent_term ne "molecular_function" )
	&& ( $parent_term ne "cellular_component" )
	&& ( $parent_term ne "biological_process" ) ) {
      print $parent_term."(".$term->type "), ";
  }
}

       
 
 
 


 

© F. Kokocinski, 1999 - 2007
no warranty or liability for content and contents of linked pages

#google analytics code