User:Darked~enwiki/ABRF 2005
ABRF 2005 Savannah Feb 05-09th
Main topics:
- Proteomics/ Mass spectrometry
- Microarrays
- DNA sequencing
- Bioinformatics
Tutorials (Feb 05):
- Mascot (David Wishtar, UAlberta, Edmonton)
- Global Proteomic Machine / X!Tandem
- Sequest (Aaron Lucas )
- Spectrum Mill ( David Horn, Agilent)
Mascot
kinds of analyses:
- PMT
- Seq tag quering
- MS/MS Ion searches
Price 7K$/1 CPU, 12.5K 2CPUs, down to 4K$CPU with large purchases Requirements: Linux (Windows) cluster/ recommended 2GB RAM / node
Other components:
- Mascot Distiller
- Mascot Deamon
Hints: - knowing estimated mass or isoelectric point helps - with Protein Fingerprintin do not use Swiss Prot -> use NR
Example files: http://gchelpdesk.ualberta.ca/ABRF2005/
Algorithm: Mowse scoring
Global Proteomic Machine / X!Tandem
- thegpm.com
+ comments from Sunday Ron Beavis
Function:
- IDs proteins from MS/MS data
- permits point mutations!
- Open source, Perl, Knoppix distro exists
- Multithreded but also a version running on cluster (linux) in Kentucky
- uses Open Mass Spectrometry Search Algorithm
Open Mass Spectrometry Search Algorithm Lewis Y. Geer, Sanford P. Markey, Jeffrey A. Kowalak, Lukas Wagner, Ming Xu, Dawn M. Maynard, Xiaoyu Yang, Wenyao Shi, and Stephen H. Bryant J. Proteome Res.; 2004; 3(5) pp 958 - 964; (Article) DOI: 10.1021/pr0499491
- Uses database of reversed protein sequences to indicate getting into "bad matches area"
- stores a database of real mass spec spectra (50 milions donated so far!) and one can compare these with actual results or a given peptide (if present)
Output of spectra as scalable vector graphic Common XML input output format
Sequest
In a standard relase:
- requires Win2000 as a head node! even if cluster can be Windows head node/Linux slaves using PVM
- Head 4-6 GB RAM
- 5TB disks on head node
- 32 CPUs in toital
big SRF files do not work on cluster
General impression: Works but it is a cludge. Creates bunch of (tens of thousands) small files (1-4kb) in single dir making backups/maintenance etc. very hard on OS.
FPGA containing Linux box (Sequest Sourcerer) -> rewritten algoritm/ fast from THERMAL
Data output: LCQ 19MB LTQ 100MB LTQ-F > 250MB /run
Other notes:
- 75% peptides after tripsin /77% with perfect chemotrypsin digest are unique
- some people claim that Sequest algorithm is superior in accuracy than mascot on anything longer than 9AA
- exports XML/Excel files
Spectrum Mill
- works only on Internet Explorer/ Server on Windows.
- used mostly for de novo sequencing
- not much comparative data
Mascot Integra Lab LIMS system based on Lab Vantage oand Oracle 9 using Phoreiix exchange format
Requirements:
- dual 3.2 Xeons
- Win Server 2003
Pricing: depends on number of concurent users / number of CPUs (29$K entry level)
BIND database from Blueprint.org
- Manualy curated (27 Toronto + 12 Singapore) protein-protein interaction database (JAVA + MySQL)
- BIND ids standarised accross Science, Nature and Cell jurnalls
- introducing an idea of 'ontogliphs" a set of 84 squiggly characters used to represent major GO terms
Tutorial on protein alignment
by Kimmen Sjolander. Berkeley
programs to try and use:
- DAPHNE -> no link so far
- [BETE]
Machine of the Year
454 sequencer:
http://www.454.com/
- sequencing on microbeads thousands of small pieces at the same time
- cost: 500K$ /5K per kit or ca 37K per service run
- performance: up to 35MB of raw sequence per run!
Cons:
- very short runs so far 100-160bp
- intensograms instead of chromatograms
- no phred compatible phred values/ different assembler needed