uk.ac.ebi.intact.util
Class ProteinFastaDownload

java.lang.Object
  |
  +--uk.ac.ebi.intact.util.ProteinFastaDownload

public class ProteinFastaDownload
extends java.lang.Object

This util class retrieves protein sequences from IntAct It downloads all IntAct sequences in fasta format into a file "proteinFastaDownload": * the fasta header contains the IntAct ac * the protein sequence follows (from the next line) like that: >ac seq........................... >ac seq........................... -------------------- Creation of the file: needs to specify the path where this file will be created. This private attribute is a String which specifies the path where the file will be created Check if this directory exists before running this utility private final String PATH_INTACT_FORMAT_FILE = "/ebi/sp/misc1/tmp/shuet/intactblast" + "/intact-data/"; ------------------- Format of the file Finally, the process formats the file. Formatdb must be used in order to format protein source database like IntAct, before this database can be searched by blastp or fasta. This private attribute is the command line of the formatdb program private final String FORMAT_COMMAND_LINE = "bsub -I " + "/ebi/extserv/data1/appbin/linux-x86/ncbi-blast/formatdb -i " + PATH_INTACT_FORMAT_FILE; all the Formatdb documentation is available on : http://ccgb.umn.edu/support/software/NCBI/README.formatdb

Version:
: $Id: ProteinFastaDownload.java,v 1.2 2003/07/16 13:58:37 skerrien Exp $
Author:
shuet (shuet@ebi.ac.uk)

Constructor Summary
ProteinFastaDownload()
           
 
Method Summary
protected  boolean filledProteinFastaFile(java.lang.String filecontent)
          This method * delete all files in the directory * creates the proteinFastaDownload File and full it with the search result * calls the format database method at the end to make available the process with a sequence analysis algorithm like Blast or Fasta
protected  boolean formatProteinFastaFile(java.io.File fileToFormat)
          If we need to format the protein database in a Fasta format file, before processing a biological software like Blast or Fasta, the corresponding command line must be runned by this method.
protected  java.lang.String getAllProteinIntAct(IntactHelper helper)
          get all protein sequences in a Fasta format (in a String object)
protected  java.lang.String getLineSeparator()
          get the line separator string.
static void main(java.lang.String[] args)
          This utilitie allows to download all protein sequences being in the IntAct database With a call to this class, it should download all IntAct sequences into a file in fasta format: the fasta header contains the IntAct ac, and the protein sequence behind.
protected  void outputProcessManagement(java.lang.Object stream, boolean in)
          This method manages the output stream of a process: limited buffer size: the process needs that a bufferReader reads the screen output stream and the error output stream.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ProteinFastaDownload

public ProteinFastaDownload()
Method Detail

getLineSeparator

protected java.lang.String getLineSeparator()
get the line separator string. It allows to use the same separator int the service and int the client to keep the multiplateform aspect.

Returns:
the line separator

outputProcessManagement

protected void outputProcessManagement(java.lang.Object stream,
                                       boolean in)
This method manages the output stream of a process: limited buffer size: the process needs that a bufferReader reads the screen output stream and the error output stream.

Parameters:
stream - manage this screen output stream with the appropriate object.
in - boolean to specify if the previous Object parameter is an InputStream or an OutputStream.

formatProteinFastaFile

protected boolean formatProteinFastaFile(java.io.File fileToFormat)
If we need to format the protein database in a Fasta format file, before processing a biological software like Blast or Fasta, the corresponding command line must be runned by this method.

Parameters:
fileToFormat - the file which needs to be formatted
Returns:
boolean Attribute which inform if the process has been well done

getAllProteinIntAct

protected java.lang.String getAllProteinIntAct(IntactHelper helper)
get all protein sequences in a Fasta format (in a String object)

Parameters:
helper - The IntactHelper object which allows to retrieve data thanks to the search method
Returns:
String all protein sequences stored in IntAct

filledProteinFastaFile

protected boolean filledProteinFastaFile(java.lang.String filecontent)
This method * delete all files in the directory * creates the proteinFastaDownload File and full it with the search result * calls the format database method at the end to make available the process with a sequence analysis algorithm like Blast or Fasta

Parameters:
filecontent - is a String that is going to full the file
Returns:
boolean Inform if the formatdb process is well done

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
This utilitie allows to download all protein sequences being in the IntAct database With a call to this class, it should download all IntAct sequences into a file in fasta format: the fasta header contains the IntAct ac, and the protein sequence behind. Finally, the file is already formatted and ready to be launched with a sequence analysis program.

Parameters:
args -
Throws:
java.lang.Exception


IntAct Project - EMBL-EBI 2004 - intact-help@ebi.ac.uk