uk.ac.ebi.intact.util
Class UpdateProteins

java.lang.Object
  |
  +--uk.ac.ebi.intact.util.UpdateProteinsI
        |
        +--uk.ac.ebi.intact.util.UpdateProteins

public class UpdateProteins
extends UpdateProteinsI

Parse an URL and update the IntAct database.

Here is the detail implemented algorithm

 

(1) From the URL given by the user, get an EntryIterator to process them one by one.

(2) for each SPTREntry

(2.1) a) From the Accession number, retreive from IntAct all Protein with that AC as a SPTR Xref. We can find several instance of Protein in case they are link to different BioSource. Lets call that set of Protein: PROTEINS. Note: an SPTREntry can contains several AC so we check in IntAct for all of them.

b) From PROTEINS, we retreive from IntAct all Splice Variant (ie. Protein) with the AC of a retreived proteins (in PROTEINS) and having a CvXrefQualifier equals to isoform-parent. We can find several instance of Splice Variant per master Protein in case in case we have multiple BioSource. Lets call that set of Protein: SPLICE-VARIANTS.

(2.2) The user can give a taxid 'filter' (lets call it t) in order to retrieve only protein related to that taxid (beware that behind the scene, all protein are update/create). In an SPTREntry, there is 1..n specified organism (i.e. taxid). So if the taxid parameter t is null, we give back to the user all proteins created or updated, if a valid taxid is given by the user, we filter the set of proteins. If it is not found, the procedure fails.

(2.3) For each taxid of the SPTREntry (lets call it TAXID)

(2.3.1) Get up-to-date information about the organism from Newt. If that organism is already existing inIntAct as a BioSource, we check if an update is needed. We take also into account that a taxid can be obsolete and in such a case we update IntAct data accordingly.

(2.3.2) a) If a Protein from PROTEINS (cf. 2.1) has TAXID as BioSource, we update its data from the SPTREntry.

b) If no Protein from PROTEINS has TAXID as BioSource, we create a new Protein.

c) If a Protein from PROTEINS has a taxid not found in the SPTREntry, we display a warning message.

d) If a Protein from SPLICE-VARIANTS (cf. 2.1) has TAXID as BioSource, we update its data from the SPTREntry.

e) If no Protein from SPLICE-VARIANTS has TAXID as BioSource, we create a new Protein.

f) If a Protein from SPLICE-VARIANTS has a taxid not found in the SPTREntry, we display a warning message.

Cross references created on step 2.3.2:

For Proteins :

(1) a link to uniprot

Xref( CvDatabase(uniprot) primaryId(uniprotAc-spliceVarNumber) secondaryId(uniprotId) CvXrefQualifier(identity) );

(2) Link to GO, SGD, INTERPRO, FLYBASE. Those Xrefs comply to the following schema: TODO: when updating Xrefs, remove those that no longer exists

Xref( CvDatabase(DB), primaryId(AC), secondaryId(ID), CvXrefQualifier(-) );

For Splice Variants:

(1) a link to the master protein

Xref( CvDatabase(intact) primaryId(intactAc) secondaryId(intactShortlabel) CvXrefQualifier(isoform-parent) );

(2) a link to uniprot

Xref( CvDatabase(uniprot) primaryId(uniprotAc-spliceVarNumber) secondaryId(uniprotId) CvXrefQualifier(identity) );

BEWARE that no checks have been done about the ownership of updated objects.

Version:
$Id: UpdateProteins.java,v 1.42 2004/03/26 11:33:23 skerrien Exp $
Author:
Samuel Kerrien (skerrien@ebi.ac.uk)

Nested Class Summary
 
Nested classes inherited from class uk.ac.ebi.intact.util.UpdateProteinsI
UpdateProteinsI.UpdateException
 
Field Summary
 
Fields inherited from class uk.ac.ebi.intact.util.UpdateProteinsI
bioSourceFactory, flybaseDatabase, geneNameAliasType, geneNameSynonymAliasType, goDatabase, helper, identityXrefQualifier, intactDatabase, interproDatabase, isoformComment, isoFormParentXrefQualifier, isoformSynonym, localTransactionControl, logger, myInstitution, parsingExceptions, secondaryXrefQualifier, sgdDatabase, srsUrl, uniprotDatabase
 
Constructor Summary
UpdateProteins(IntactHelper helper)
           
UpdateProteins(IntactHelper helper, boolean setOutput)
           
 
Method Summary
 void addNewAlias(AnnotatedObject current, Alias alias)
          add (not update) a new Xref to the given Annotated object and write it in the database.
 void addNewAnnotation(AnnotatedObject current, Annotation annotation)
          Add an annotation to an annotated object.
 boolean addNewXref(AnnotatedObject current, Xref xref)
          add (not update) a new Xref to the given Annotated object and write it in the database.
 java.lang.String getAnEntry(java.lang.String anUrl)
          From a given URL, returns a string of a SPTR entry.
 int getEntryCount()
          Gives the number of entry found in the given URL
 int getEntryProcessededCount()
          Gives the number of entry successfully processed.
 int getEntrySkippedCount()
          Gives the number of entry skipped during the process.
 java.lang.String getErrorFileName()
          return the filename in which have been saved all Entries which gaves us processing errors.
 int getProteinCount()
          Gives the count of all potential protein (i.e.
 int getProteinCreatedCount()
          Gives the count of created protein
 int getProteinSkippedCount()
          Gives the count of protein which gaves us errors during the processing.
 int getProteinUpdatedCount()
          Gives the count of updated protein
 int getProteinUpToDateCount()
          Gives the count of up-to-date protein (i.e.
 int getSpliceVariantCount()
          Gives the count of all potential splice variant (i.e.
 int getSpliceVariantCreatedCount()
          Gives the count of created splice variant
 int getSpliceVariantProteinCount()
           
 int getSpliceVariantSkippedCount()
          Gives the count of splice variant which gaves us errors during the processing.
 int getSpliceVariantUpdatedCount()
          Gives the count of updated splice variant
 int getSpliceVariantUpToDateCount()
          Gives the count of up-to-date splice variant (i.e.
 java.lang.String getUrl(java.lang.String uniprotAC)
          From a given sptr AC, returns a full URL from where a flatfile format SPTR entry will be fetched.
 Protein insertSimpleProtein(java.lang.String anAc, CvDatabase aDatabase, java.lang.String aTaxId)
          Creates a simple Protein object for entries which are not in SPTR.
 java.util.Collection insertSPTrProteins(java.io.InputStream inputStream, java.lang.String taxid, boolean update)
          Inserts zero or more proteins created from SPTR entries which are retrieved from a Stream.
 java.util.Collection insertSPTrProteins(java.lang.String proteinAc)
          Inserts zero or more proteins created from SPTR entries which are retrieved from an SPTR Accession number.
 java.util.Collection insertSPTrProteins(java.lang.String proteinAc, java.lang.String taxId, boolean update)
          Inserts zero or more proteins created from SPTR entries which are retrieved from an SPTR Accession number.
 int insertSPTrProteinsFromURL(java.lang.String sourceUrl, java.lang.String taxid, boolean update)
          Inserts zero or more proteins created from SPTR entries which are retrieved from a URL.
 boolean isLocalTransactionControl()
          If true, each protein is updated in a distinct transaction.
static void main(java.lang.String[] args)
          D E M O

Could be use for loading from a .txl file ./scripts/javaRun.sh UpdateProteins file:///homes/user/mySPTRfile.txl

 gnu.regexp.REMatch[] match(java.lang.String textin, java.lang.String pattern)
          from a given string and a given pattern(string), to find all matches.
 void setDebugOnScreen(boolean debug)
          Allows to displays on the screen what's going on during the update process.
 void setLocalTransactionControl(boolean aLocalTransactionControl)
          If true, each protein is updated in a distinct transaction.
 
Methods inherited from class uk.ac.ebi.intact.util.UpdateProteinsI
getParsingExceptions, setLogger
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

UpdateProteins

public UpdateProteins(IntactHelper helper)
               throws UpdateProteinsI.UpdateException

UpdateProteins

public UpdateProteins(IntactHelper helper,
                      boolean setOutput)
               throws UpdateProteinsI.UpdateException
Method Detail

getProteinCreatedCount

public int getProteinCreatedCount()
Description copied from class: UpdateProteinsI
Gives the count of created protein

Specified by:
getProteinCreatedCount in class UpdateProteinsI
Returns:
created protein count

getProteinUpdatedCount

public int getProteinUpdatedCount()
Description copied from class: UpdateProteinsI
Gives the count of updated protein

Specified by:
getProteinUpdatedCount in class UpdateProteinsI
Returns:
updated protein count

getProteinUpToDateCount

public int getProteinUpToDateCount()
Description copied from class: UpdateProteinsI
Gives the count of up-to-date protein (i.e. existing in IntAct but don't need to be updated)

Specified by:
getProteinUpToDateCount in class UpdateProteinsI
Returns:
up-to-date protein count

getProteinCount

public int getProteinCount()
Description copied from class: UpdateProteinsI
Gives the count of all potential protein (i.e. for an SPTREntry, we can create/update several IntAct protein. One by entry's taxid)

Specified by:
getProteinCount in class UpdateProteinsI
Returns:
potential protein count

getProteinSkippedCount

public int getProteinSkippedCount()
Description copied from class: UpdateProteinsI
Gives the count of protein which gaves us errors during the processing.

Specified by:
getProteinSkippedCount in class UpdateProteinsI
Returns:

getSpliceVariantCreatedCount

public int getSpliceVariantCreatedCount()
Description copied from class: UpdateProteinsI
Gives the count of created splice variant

Specified by:
getSpliceVariantCreatedCount in class UpdateProteinsI
Returns:
created protein count

getSpliceVariantUpdatedCount

public int getSpliceVariantUpdatedCount()
Description copied from class: UpdateProteinsI
Gives the count of updated splice variant

Specified by:
getSpliceVariantUpdatedCount in class UpdateProteinsI
Returns:
updated protein count

getSpliceVariantUpToDateCount

public int getSpliceVariantUpToDateCount()
Description copied from class: UpdateProteinsI
Gives the count of up-to-date splice variant (i.e. existing in IntAct but don't need to be updated)

Specified by:
getSpliceVariantUpToDateCount in class UpdateProteinsI
Returns:
up-to-date protein count

getSpliceVariantCount

public int getSpliceVariantCount()
Description copied from class: UpdateProteinsI
Gives the count of all potential splice variant (i.e. for an SPTREntry, we can create/update several IntAct protein. One by entry's taxid)

Specified by:
getSpliceVariantCount in class UpdateProteinsI
Returns:
potential protein count

getSpliceVariantProteinCount

public int getSpliceVariantProteinCount()

getSpliceVariantSkippedCount

public int getSpliceVariantSkippedCount()
Description copied from class: UpdateProteinsI
Gives the count of splice variant which gaves us errors during the processing.

Specified by:
getSpliceVariantSkippedCount in class UpdateProteinsI
Returns:

getEntryCount

public int getEntryCount()
Description copied from class: UpdateProteinsI
Gives the number of entry found in the given URL

Specified by:
getEntryCount in class UpdateProteinsI
Returns:
entry count

getEntryProcessededCount

public int getEntryProcessededCount()
Description copied from class: UpdateProteinsI
Gives the number of entry successfully processed.

Specified by:
getEntryProcessededCount in class UpdateProteinsI
Returns:
entry successfully processed count.

getEntrySkippedCount

public int getEntrySkippedCount()
Description copied from class: UpdateProteinsI
Gives the number of entry skipped during the process.

Specified by:
getEntrySkippedCount in class UpdateProteinsI
Returns:
skipped entry count.

setDebugOnScreen

public void setDebugOnScreen(boolean debug)
Description copied from class: UpdateProteinsI
Allows to displays on the screen what's going on during the update process.

Specified by:
setDebugOnScreen in class UpdateProteinsI
Parameters:
debug - true to enable, false to disable

getUrl

public final java.lang.String getUrl(java.lang.String uniprotAC)
Description copied from class: UpdateProteinsI
From a given sptr AC, returns a full URL from where a flatfile format SPTR entry will be fetched. Note, the SRS has several format of data output, the URLs which outputs html format SPTR entry CANNOT be used, since YASP does't have html parsing function.

Specified by:
getUrl in class UpdateProteinsI
Parameters:
uniprotAC - a SPTR AC
Returns:
a full URL.

getAnEntry

public java.lang.String getAnEntry(java.lang.String anUrl)
Description copied from class: UpdateProteinsI
From a given URL, returns a string of a SPTR entry.

Specified by:
getAnEntry in class UpdateProteinsI
Parameters:
anUrl - a URL which outputs flatfile of
Returns:
a full URL.

match

public gnu.regexp.REMatch[] match(java.lang.String textin,
                                  java.lang.String pattern)
from a given string and a given pattern(string), to find all matches. The matched are retured as a list. This method uses gnu.regexp.* package, not the org.apache.regexp.*

Specified by:
match in class UpdateProteinsI
Parameters:
textin - A string from which some pattern will be matched.
pattern - A string as a pattern.
Returns:
A list of matched pattern.

addNewXref

public boolean addNewXref(AnnotatedObject current,
                          Xref xref)
Description copied from class: UpdateProteinsI
add (not update) a new Xref to the given Annotated object and write it in the database.

Specified by:
addNewXref in class UpdateProteinsI
Parameters:
current - the object to which we add a new Xref
xref - the Xref to add to the AnnotatedObject
Returns:
true if the object as been added, else false.

addNewAlias

public void addNewAlias(AnnotatedObject current,
                        Alias alias)
Description copied from class: UpdateProteinsI
add (not update) a new Xref to the given Annotated object and write it in the database.

Specified by:
addNewAlias in class UpdateProteinsI
Parameters:
current - the object to which we add a new Xref
alias - the Alias to add to the AnnotatedObject

addNewAnnotation

public void addNewAnnotation(AnnotatedObject current,
                             Annotation annotation)
Add an annotation to an annotated object.
We check if that annotation is not already existing, if so, we don't record it.

Parameters:
current - the annotated object to which we want to add an Annotation.
annotation - the annotation to add the Annotated object

insertSPTrProteins

public java.util.Collection insertSPTrProteins(java.lang.String proteinAc)
Description copied from class: UpdateProteinsI
Inserts zero or more proteins created from SPTR entries which are retrieved from an SPTR Accession number. IntAct Protein objects represent a specific amino acid sequence in a specific organism. If a SPTr entry contains more than one organism, one IntAct entry will be created for each organism.

Specified by:
insertSPTrProteins in class UpdateProteinsI
Parameters:
proteinAc - SPTR Accession number of the protein to insert/update
Returns:
a set of created/updated protein.

insertSPTrProteins

public java.util.Collection insertSPTrProteins(java.lang.String proteinAc,
                                               java.lang.String taxId,
                                               boolean update)
Description copied from class: UpdateProteinsI
Inserts zero or more proteins created from SPTR entries which are retrieved from an SPTR Accession number. IntAct Protein objects represent a specific amino acid sequence in a specific organism. If a SPTr entry contains more than one organism, one IntAct entry will be created for each organism.

Specified by:
insertSPTrProteins in class UpdateProteinsI
Parameters:
proteinAc - SPTR Accession number of the protein to insert/update
taxId - The tax id the protein should have
update - If true, update existing Protein objects according to the retrieved data. else, skip existing Protein objects.
Returns:
a set of created/updated protein.

insertSimpleProtein

public Protein insertSimpleProtein(java.lang.String anAc,
                                   CvDatabase aDatabase,
                                   java.lang.String aTaxId)
                            throws IntactException
Creates a simple Protein object for entries which are not in SPTR. The Protein will more or less only contain the crossreference to the source database.

Specified by:
insertSimpleProtein in class UpdateProteinsI
Parameters:
anAc - The primary identifier of the protein in the external database.
aDatabase - The database in which the protein is listed.
aTaxId - The tax id the protein should have
Returns:
the protein created or retrieved from the IntAct database
IntactException

insertSPTrProteinsFromURL

public int insertSPTrProteinsFromURL(java.lang.String sourceUrl,
                                     java.lang.String taxid,
                                     boolean update)
Description copied from class: UpdateProteinsI
Inserts zero or more proteins created from SPTR entries which are retrieved from a URL. IntAct Protein objects represent a specific amino acid sequence in a specific organism. If a SPTr entry contains more than one organism, one IntAct entry will be created for each organism, unless the taxid parameter is not null.

Specified by:
insertSPTrProteinsFromURL in class UpdateProteinsI
Parameters:
sourceUrl - The URL which delivers zero or more SPTR flat file formatted entries.
taxid - Of all entries retrieved from sourceURL, insert only those which have this taxid. If taxid is empty, insert all protein objects.
update - If true, update existing Protein objects according to the retrieved data. else, skip existing Protein objects.
Returns:
The number of protein objects created.

insertSPTrProteins

public java.util.Collection insertSPTrProteins(java.io.InputStream inputStream,
                                               java.lang.String taxid,
                                               boolean update)
Description copied from class: UpdateProteinsI
Inserts zero or more proteins created from SPTR entries which are retrieved from a Stream. IntAct Protein objects represent a specific amino acid sequence in a specific organism. If a SPTr entry contains more than one organism, one IntAct entry will be created for each organism, unless the taxid parameter is not null.

Specified by:
insertSPTrProteins in class UpdateProteinsI
Parameters:
inputStream - The straem from which YASP will read the ENtries content.
taxid - Of all entries retrieved from sourceURL, insert only those which have this taxid. If taxid is empty, insert all protein objects.
update - If true, update existing Protein objects according to the retrieved data. else, skip existing Protein objects.
Returns:
Collection of protein objects created/updated.

getErrorFileName

public java.lang.String getErrorFileName()
Description copied from class: UpdateProteinsI
return the filename in which have been saved all Entries which gaves us processing errors.

Specified by:
getErrorFileName in class UpdateProteinsI
Returns:
the filename or null if not existing

isLocalTransactionControl

public boolean isLocalTransactionControl()
If true, each protein is updated in a distinct transaction. If localTransactionControl is false, no local transactions are initiated, control is left with the calling class. This can be used e.g. to have transctions span the insertion of all proteins of an entire complex.

Specified by:
isLocalTransactionControl in class UpdateProteinsI
Returns:
current value of localTransactionControl

setLocalTransactionControl

public void setLocalTransactionControl(boolean aLocalTransactionControl)
If true, each protein is updated in a distinct transaction. If localTransactionControl is false, no local transactions are initiated, control is left with the calling class. This can be used e.g. to have transctions span the insertion of all proteins of an entire complex.

Specified by:
setLocalTransactionControl in class UpdateProteinsI
Parameters:
aLocalTransactionControl - New value for localTransactionControl

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
D E M O

Could be use for loading from a .txl file ./scripts/javaRun.sh UpdateProteins file:///homes/user/mySPTRfile.txl

java.lang.Exception


IntAct Project - EMBL-EBI 2004 - intact-help@ebi.ac.uk