uk.ac.ebi.intact.util
Class UpdateProteinsI

java.lang.Object
  |
  +--uk.ac.ebi.intact.util.UpdateProteinsI
Direct Known Subclasses:
UpdateProteins

public abstract class UpdateProteinsI
extends java.lang.Object

Defines the functionality of protein import utilities.


Nested Class Summary
static class UpdateProteinsI.UpdateException
           
 
Field Summary
protected  BioSourceFactory bioSourceFactory
           
protected static CvDatabase flybaseDatabase
           
protected static CvAliasType geneNameAliasType
           
protected static CvAliasType geneNameSynonymAliasType
           
protected static CvDatabase goDatabase
           
protected  IntactHelper helper
           
protected static CvXrefQualifier identityXrefQualifier
          Describe wether an Xref is related the primary SPTR AC (identityCrefQualifier) or not (secondaryXrefQualifier)
protected static CvDatabase intactDatabase
           
protected static CvDatabase interproDatabase
           
protected  CvTopic isoformComment
           
protected  CvXrefQualifier isoFormParentXrefQualifier
           
protected  CvAliasType isoformSynonym
           
protected static boolean localTransactionControl
          If true, each protein is updated in a distinct transaction.
protected static org.apache.log4j.Logger logger
           
protected static Institution myInstitution
          The owner of the created object
protected  java.util.Map parsingExceptions
           
protected static CvXrefQualifier secondaryXrefQualifier
           
protected static CvDatabase sgdDatabase
           
protected static java.lang.String srsUrl
           
protected static CvDatabase uniprotDatabase
          Xref databases
 
Constructor Summary
UpdateProteinsI(boolean setOutputOn)
           
UpdateProteinsI(IntactHelper helper)
          Default constructor which initialize the bioSource cache to default.
UpdateProteinsI(IntactHelper helper, boolean setOutputOn)
          Default constructor which initialize the bioSource cache to default.
UpdateProteinsI(IntactHelper helper, int cacheSize)
           
 
Method Summary
abstract  void addNewAlias(AnnotatedObject current, Alias alias)
          add (not update) a new Xref to the given Annotated object and write it in the database.
abstract  boolean addNewXref(AnnotatedObject current, Xref xref)
          add (not update) a new Xref to the given Annotated object and write it in the database.
abstract  java.lang.String getAnEntry(java.lang.String url)
          From a given URL, returns a string of a SPTR entry.
abstract  int getEntryCount()
          Gives the number of entry found in the given URL
abstract  int getEntryProcessededCount()
          Gives the number of entry successfully processed.
abstract  int getEntrySkippedCount()
          Gives the number of entry skipped during the process.
abstract  java.lang.String getErrorFileName()
          return the filename in which have been saved all Entries which gaves us processing errors.
 java.util.Map getParsingExceptions()
          Gives all Exceptions that have been raised during the last processing.
abstract  int getProteinCount()
          Gives the count of all potential protein (i.e.
abstract  int getProteinCreatedCount()
          Gives the count of created protein
abstract  int getProteinSkippedCount()
          Gives the count of protein which gaves us errors during the processing.
abstract  int getProteinUpdatedCount()
          Gives the count of updated protein
abstract  int getProteinUpToDateCount()
          Gives the count of up-to-date protein (i.e.
abstract  int getSpliceVariantCount()
          Gives the count of all potential splice variant (i.e.
abstract  int getSpliceVariantCreatedCount()
          Gives the count of created splice variant
abstract  int getSpliceVariantSkippedCount()
          Gives the count of splice variant which gaves us errors during the processing.
abstract  int getSpliceVariantUpdatedCount()
          Gives the count of updated splice variant
abstract  int getSpliceVariantUpToDateCount()
          Gives the count of up-to-date splice variant (i.e.
abstract  java.lang.String getUrl(java.lang.String sptrAC)
          From a given sptr AC, returns a full URL from where a flatfile format SPTR entry will be fetched.
abstract  Protein insertSimpleProtein(java.lang.String anAc, CvDatabase aDatabase, java.lang.String aTaxId)
          Creates a simple Protein object for entries which are not in SPTR.
abstract  java.util.Collection insertSPTrProteins(java.io.InputStream inputStream, java.lang.String taxid, boolean update)
          Inserts zero or more proteins created from SPTR entries which are retrieved from a Stream.
abstract  java.util.Collection insertSPTrProteins(java.lang.String proteinAc)
          Inserts zero or more proteins created from SPTR entries which are retrieved from an SPTR Accession number.
abstract  java.util.Collection insertSPTrProteins(java.lang.String proteinAc, java.lang.String taxId, boolean update)
          Inserts zero or more proteins created from SPTR entries which are retrieved from an SPTR Accession number.
abstract  int insertSPTrProteinsFromURL(java.lang.String sourceUrl, java.lang.String taxid, boolean update)
          Inserts zero or more proteins created from SPTR entries which are retrieved from a URL.
abstract  boolean isLocalTransactionControl()
          If true, each protein is updated in a distinct transaction.
abstract  gnu.regexp.REMatch[] match(java.lang.String textin, java.lang.String pattern)
          from a given string and a given pattern(string), to find all matches.
abstract  void setDebugOnScreen(boolean debug)
          Allows to displays on the screen what's going on during the update process.
abstract  void setLocalTransactionControl(boolean localTransactionControl)
          If true, each protein is updated in a distinct transaction.
 void setLogger(org.apache.log4j.Logger aLogger)
          Set the updateprotein logger and those of 3rd party tools.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

logger

protected static org.apache.log4j.Logger logger

myInstitution

protected static Institution myInstitution
The owner of the created object


uniprotDatabase

protected static CvDatabase uniprotDatabase
Xref databases


srsUrl

protected static java.lang.String srsUrl

intactDatabase

protected static CvDatabase intactDatabase

sgdDatabase

protected static CvDatabase sgdDatabase

goDatabase

protected static CvDatabase goDatabase

interproDatabase

protected static CvDatabase interproDatabase

flybaseDatabase

protected static CvDatabase flybaseDatabase

identityXrefQualifier

protected static CvXrefQualifier identityXrefQualifier
Describe wether an Xref is related the primary SPTR AC (identityCrefQualifier) or not (secondaryXrefQualifier)


secondaryXrefQualifier

protected static CvXrefQualifier secondaryXrefQualifier

isoFormParentXrefQualifier

protected CvXrefQualifier isoFormParentXrefQualifier

isoformComment

protected CvTopic isoformComment

isoformSynonym

protected CvAliasType isoformSynonym

geneNameAliasType

protected static CvAliasType geneNameAliasType

geneNameSynonymAliasType

protected static CvAliasType geneNameSynonymAliasType

helper

protected IntactHelper helper

bioSourceFactory

protected BioSourceFactory bioSourceFactory

localTransactionControl

protected static boolean localTransactionControl
If true, each protein is updated in a distinct transaction. If localTransactionControl is false, no local transactions are initiated, control is left with the calling class. This can be used e.g. to have transactions span the insertion of all proteins of an entire complex. Default is true.


parsingExceptions

protected java.util.Map parsingExceptions
Constructor Detail

UpdateProteinsI

public UpdateProteinsI(boolean setOutputOn)

UpdateProteinsI

public UpdateProteinsI(IntactHelper helper,
                       int cacheSize)
                throws UpdateProteinsI.UpdateException
Parameters:
helper - IntactHelper object to access (read/write) the database.
cacheSize - the number of valid biosource to cache during the update process.
Throws:
UpdateProteinsI.UpdateException
UpdateProteinsI.UpdateException

UpdateProteinsI

public UpdateProteinsI(IntactHelper helper,
                       boolean setOutputOn)
                throws UpdateProteinsI.UpdateException
Default constructor which initialize the bioSource cache to default.

Parameters:
helper - IntactHelper object to access (read/write) the database.
Throws:
UpdateProteinsI.UpdateException
UpdateProteinsI.UpdateException

UpdateProteinsI

public UpdateProteinsI(IntactHelper helper)
                throws UpdateProteinsI.UpdateException
Default constructor which initialize the bioSource cache to default.

Parameters:
helper - IntactHelper object to access (read/write) the database.
Throws:
UpdateProteinsI.UpdateException
UpdateProteinsI.UpdateException
Method Detail

getParsingExceptions

public java.util.Map getParsingExceptions()
Gives all Exceptions that have been raised during the last processing.

Returns:
a map Entry Count ---> Exception. It can be null.

setLogger

public void setLogger(org.apache.log4j.Logger aLogger)
Set the updateprotein logger and those of 3rd party tools.

Parameters:
aLogger - the new logger.

insertSPTrProteins

public abstract java.util.Collection insertSPTrProteins(java.io.InputStream inputStream,
                                                        java.lang.String taxid,
                                                        boolean update)
Inserts zero or more proteins created from SPTR entries which are retrieved from a Stream. IntAct Protein objects represent a specific amino acid sequence in a specific organism. If a SPTr entry contains more than one organism, one IntAct entry will be created for each organism, unless the taxid parameter is not null.

Parameters:
inputStream - The straem from which YASP will read the ENtries content.
taxid - Of all entries retrieved from sourceURL, insert only those which have this taxid. If taxid is empty, insert all protein objects.
update - If true, update existing Protein objects according to the retrieved data. else, skip existing Protein objects.
Returns:
Collection of protein objects created/updated.

insertSPTrProteinsFromURL

public abstract int insertSPTrProteinsFromURL(java.lang.String sourceUrl,
                                              java.lang.String taxid,
                                              boolean update)
Inserts zero or more proteins created from SPTR entries which are retrieved from a URL. IntAct Protein objects represent a specific amino acid sequence in a specific organism. If a SPTr entry contains more than one organism, one IntAct entry will be created for each organism, unless the taxid parameter is not null.

Parameters:
sourceUrl - The URL which delivers zero or more SPTR flat file formatted entries.
taxid - Of all entries retrieved from sourceURL, insert only those which have this taxid. If taxid is empty, insert all protein objects.
update - If true, update existing Protein objects according to the retrieved data. else, skip existing Protein objects.
Returns:
The number of protein objects created.

insertSPTrProteins

public abstract java.util.Collection insertSPTrProteins(java.lang.String proteinAc)
Inserts zero or more proteins created from SPTR entries which are retrieved from an SPTR Accession number. IntAct Protein objects represent a specific amino acid sequence in a specific organism. If a SPTr entry contains more than one organism, one IntAct entry will be created for each organism.

Parameters:
proteinAc - SPTR Accession number of the protein to insert/update
Returns:
a set of created/updated protein.

insertSPTrProteins

public abstract java.util.Collection insertSPTrProteins(java.lang.String proteinAc,
                                                        java.lang.String taxId,
                                                        boolean update)
Inserts zero or more proteins created from SPTR entries which are retrieved from an SPTR Accession number. IntAct Protein objects represent a specific amino acid sequence in a specific organism. If a SPTr entry contains more than one organism, one IntAct entry will be created for each organism.

Parameters:
proteinAc - SPTR Accession number of the protein to insert/update
taxId - The tax id the protein should have
update - If true, update existing Protein objects according to the retrieved data. else, skip existing Protein objects.
Returns:
a set of created/updated protein.

insertSimpleProtein

public abstract Protein insertSimpleProtein(java.lang.String anAc,
                                            CvDatabase aDatabase,
                                            java.lang.String aTaxId)
                                     throws IntactException
Creates a simple Protein object for entries which are not in SPTR. The Protein will more or less only contain the crossreference to the source database.

Parameters:
anAc - The primary identifier of the protein in the external database.
aDatabase - The database in which the protein is listed.
aTaxId - The tax id the protein should have
Returns:
the protein created or retrieved from the IntAct database
IntactException

getUrl

public abstract java.lang.String getUrl(java.lang.String sptrAC)
From a given sptr AC, returns a full URL from where a flatfile format SPTR entry will be fetched. Note, the SRS has several format of data output, the URLs which outputs html format SPTR entry CANNOT be used, since YASP does't have html parsing function.

Parameters:
sptrAC - a SPTR AC
Returns:
a full URL.

getAnEntry

public abstract java.lang.String getAnEntry(java.lang.String url)
From a given URL, returns a string of a SPTR entry.

Parameters:
url - a URL which outputs flatfile of
Returns:
a full URL.

match

public abstract gnu.regexp.REMatch[] match(java.lang.String textin,
                                           java.lang.String pattern)
from a given string and a given pattern(string), to find all matches. The matched are retured as a list. This method uses gnu.regexp.* package, not the org.apache.regexp.*

Parameters:
textin - A string from which some pattern will be matched.
pattern - A string as a pattern.
Returns:
A list of matched pattern.

addNewXref

public abstract boolean addNewXref(AnnotatedObject current,
                                   Xref xref)
add (not update) a new Xref to the given Annotated object and write it in the database.

Parameters:
current - the object to which we add a new Xref
xref - the Xref to add to the AnnotatedObject
Returns:
true if the object as been added, else false.

addNewAlias

public abstract void addNewAlias(AnnotatedObject current,
                                 Alias alias)
add (not update) a new Xref to the given Annotated object and write it in the database.

Parameters:
current - the object to which we add a new Xref
alias - the Alias to add to the AnnotatedObject

getProteinCreatedCount

public abstract int getProteinCreatedCount()
Gives the count of created protein

Returns:
created protein count

getProteinUpdatedCount

public abstract int getProteinUpdatedCount()
Gives the count of updated protein

Returns:
updated protein count

getProteinUpToDateCount

public abstract int getProteinUpToDateCount()
Gives the count of up-to-date protein (i.e. existing in IntAct but don't need to be updated)

Returns:
up-to-date protein count

getProteinCount

public abstract int getProteinCount()
Gives the count of all potential protein (i.e. for an SPTREntry, we can create/update several IntAct protein. One by entry's taxid)

Returns:
potential protein count

getProteinSkippedCount

public abstract int getProteinSkippedCount()
Gives the count of protein which gaves us errors during the processing.

Returns:

getSpliceVariantCreatedCount

public abstract int getSpliceVariantCreatedCount()
Gives the count of created splice variant

Returns:
created protein count

getSpliceVariantUpdatedCount

public abstract int getSpliceVariantUpdatedCount()
Gives the count of updated splice variant

Returns:
updated protein count

getSpliceVariantUpToDateCount

public abstract int getSpliceVariantUpToDateCount()
Gives the count of up-to-date splice variant (i.e. existing in IntAct but don't need to be updated)

Returns:
up-to-date protein count

getSpliceVariantCount

public abstract int getSpliceVariantCount()
Gives the count of all potential splice variant (i.e. for an SPTREntry, we can create/update several IntAct protein. One by entry's taxid)

Returns:
potential protein count

getSpliceVariantSkippedCount

public abstract int getSpliceVariantSkippedCount()
Gives the count of splice variant which gaves us errors during the processing.

Returns:

getEntryCount

public abstract int getEntryCount()
Gives the number of entry found in the given URL

Returns:
entry count

getEntryProcessededCount

public abstract int getEntryProcessededCount()
Gives the number of entry successfully processed.

Returns:
entry successfully processed count.

getEntrySkippedCount

public abstract int getEntrySkippedCount()
Gives the number of entry skipped during the process.

Returns:
skipped entry count.

setDebugOnScreen

public abstract void setDebugOnScreen(boolean debug)
Allows to displays on the screen what's going on during the update process.

Parameters:
debug - true to enable, false to disable

getErrorFileName

public abstract java.lang.String getErrorFileName()
return the filename in which have been saved all Entries which gaves us processing errors.

Returns:
the filename or null if not existing

isLocalTransactionControl

public abstract boolean isLocalTransactionControl()
If true, each protein is updated in a distinct transaction. If localTransactionControl is false, no local transactions are initiated, control is left with the calling class. This can be used e.g. to have transctions span the insertion of all proteins of an entire complex.

Returns:
current value of localTransactionControl

setLocalTransactionControl

public abstract void setLocalTransactionControl(boolean localTransactionControl)
If true, each protein is updated in a distinct transaction. If localTransactionControl is false, no local transactions are initiated, control is left with the calling class. This can be used e.g. to have transctions span the insertion of all proteins of an entire complex.

Parameters:
localTransactionControl - New value for localTransactionControl


IntAct Project - EMBL-EBI 2004 - intact-help@ebi.ac.uk