User:Darked~enwiki/ABRF 2005

ABRF 2005 Savannah Feb 05-09th

Main topics:

Proteomics/ Mass spectrometry
Microarrays
DNA sequencing
Bioinformatics

Tutorials (Feb 05):

Mascot (David Wishtar, UAlberta, Edmonton)
Global Proteomic Machine / X!Tandem
Sequest (Aaron Lucas )
Spectrum Mill ( David Horn, Agilent)

Mascot

kinds of analyses:

PMT
Seq tag quering
MS/MS Ion searches

Price 7K$/1 CPU, 12.5K 2CPUs, down to 4K$CPU with large purchases Requirements: Linux (Windows) cluster/ recommended 2GB RAM / node

Other components:

Mascot Distiller
Mascot Deamon

Hints: - knowing estimated mass or isoelectric point helps - with Protein Fingerprintin do not use Swiss Prot -> use NR

Example files: http://gchelpdesk.ualberta.ca/ABRF2005/

Algorithm: Mowse scoring

Global Proteomic Machine / X!Tandem

thegpm.com

+ comments from Sunday Ron Beavis

Function:

IDs proteins from MS/MS data
permits point mutations!

Open source, Perl, Knoppix distro exists
Multithreded but also a version running on cluster (linux) in Kentucky
uses Open Mass Spectrometry Search Algorithm

Open Mass Spectrometry Search Algorithm Lewis Y. Geer, Sanford P. Markey, Jeffrey A. Kowalak, Lukas Wagner, Ming Xu, Dawn M. Maynard, Xiaoyu Yang, Wenyao Shi, and Stephen H. Bryant J. Proteome Res.; 2004; 3(5) pp 958 - 964; (Article) DOI: 10.1021/pr0499491

Uses database of reversed protein sequences to indicate getting into "bad matches area"

stores a database of real mass spec spectra (50 milions donated so far!) and one can compare these with actual results or a given peptide (if present)

Output of spectra as scalable vector graphic Common XML input output format

Sequest

In a standard relase:

requires Win2000 as a head node! even if cluster can be Windows head node/Linux slaves using PVM
Head 4-6 GB RAM
5TB disks on head node
32 CPUs in toital

big SRF files do not work on cluster

General impression: Works but it is a cludge. Creates bunch of (tens of thousands) small files (1-4kb) in single dir making backups/maintenance etc. very hard on OS.

FPGA containing Linux box (Sequest Sourcerer) -> rewritten algoritm/ fast from THERMAL

Data output: LCQ 19MB LTQ 100MB LTQ-F > 250MB /run

Other notes:

75% peptides after tripsin /77% with perfect chemotrypsin digest are unique
some people claim that Sequest algorithm is superior in accuracy than mascot on anything longer than 9AA
exports XML/Excel files

Spectrum Mill

works only on Internet Explorer/ Server on Windows.
used mostly for de novo sequencing
not much comparative data

Mascot Integra Lab LIMS system based on Lab Vantage oand Oracle 9 using Phoreiix exchange format

Requirements:

dual 3.2 Xeons
Win Server 2003

Pricing: depends on number of concurent users / number of CPUs (29$K entry level)

BIND database from Blueprint.org

Manualy curated (27 Toronto + 12 Singapore) protein-protein interaction database (JAVA + MySQL)
BIND ids standarised accross Science, Nature and Cell jurnalls
introducing an idea of 'ontogliphs" a set of 84 squiggly characters used to represent major GO terms

Tutorial on protein alignment

by Kimmen Sjolander. Berkeley

programs to try and use:

[Satchmo]
[Gtree]

DAPHNE -> no link so far
[BETE]

Machine of the Year 454 sequencer: http://www.454.com/

sequencing on microbeads thousands of small pieces at the same time
cost: 500K$ /5K per kit or ca 37K per service run
performance: up to 35MB of raw sequence per run!

Cons:

very short runs so far 100-160bp
intensograms instead of chromatograms
no phred compatible phred values/ different assembler needed