|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Object
|
+--uk.ac.ebi.intact.util.UpdateProteinsI
|
+--uk.ac.ebi.intact.util.UpdateProteins
Parse an URL and update the IntAct database.
Here is the detail implemented algorithm
(1) From the URL given by the user, get an EntryIterator to process them one by one.
(2) for each SPTREntry
(2.1)
a)
From the Accession number, retreive from IntAct all Protein with that AC as
a SPTR Xref. We can find several instance of Protein in case they are link
to different BioSource. Lets call that set of Protein: PROTEINS.
Note: an SPTREntry can contains several AC so we check in IntAct for all of them.
b)
From PROTEINS, we retreive from IntAct all Splice Variant (ie. Protein)
with the AC of a retreived proteins (in PROTEINS) and having a CvXrefQualifier equals to
isoform-parent. We can find several instance of Splice Variant per master Protein
in case in case we have multiple BioSource. Lets call that set of Protein:
SPLICE-VARIANTS.
(2.2) The user can give a taxid 'filter' (lets call it t) in order to retrieve only
protein related to that taxid (beware that behind the scene, all protein are
update/create). In an SPTREntry, there is 1..n specified organism (i.e. taxid).
So if the taxid parameter t is null, we give back to the user all proteins created
or updated, if a valid taxid is given by the user, we filter the set of proteins.
If it is not found, the procedure fails.
(2.3) For each taxid of the SPTREntry (lets call it TAXID)
(2.3.1) Get up-to-date information about the organism from Newt.
If that organism is already existing inIntAct as a BioSource, we check if an
update is needed. We take also into account that a taxid can be obsolete and
in such a case we update IntAct data accordingly.
(2.3.2)
a) If a Protein from PROTEINS (cf. 2.1) has TAXID as BioSource,
we update its data from the SPTREntry.
b) If no Protein from PROTEINS has TAXID as BioSource, we create a new Protein.
c) If a Protein from PROTEINS has a taxid not found in the SPTREntry, we display
a warning message.
d) If a Protein from SPLICE-VARIANTS (cf. 2.1) has TAXID as BioSource,
we update its data from the SPTREntry.
e) If no Protein from SPLICE-VARIANTS has TAXID as BioSource, we create a new Protein.
f) If a Protein from SPLICE-VARIANTS has a taxid not found in the SPTREntry,
we display a warning message.
Cross references created on step 2.3.2:
For Proteins :
(1) a link to uniprot
Xref( CvDatabase(uniprot)
primaryId(uniprotAc-spliceVarNumber)
secondaryId(uniprotId)
CvXrefQualifier(identity)
);
(2) Link to GO, SGD, INTERPRO, FLYBASE.
Those Xrefs comply to the following schema:
TODO: when updating Xrefs, remove those that no longer exists
Xref( CvDatabase(DB),
primaryId(AC),
secondaryId(ID),
CvXrefQualifier(-)
);
For Splice Variants:
(1) a link to the master protein
Xref( CvDatabase(intact)
primaryId(intactAc)
secondaryId(intactShortlabel)
CvXrefQualifier(isoform-parent)
);
(2) a link to uniprot
Xref( CvDatabase(uniprot)
primaryId(uniprotAc-spliceVarNumber)
secondaryId(uniprotId)
CvXrefQualifier(identity)
);
BEWARE that no checks have been done about the ownership of updated objects.
| Nested Class Summary |
| Nested classes inherited from class uk.ac.ebi.intact.util.UpdateProteinsI |
UpdateProteinsI.UpdateException |
| Field Summary |
| Fields inherited from class uk.ac.ebi.intact.util.UpdateProteinsI |
bioSourceFactory, flybaseDatabase, geneNameAliasType, geneNameSynonymAliasType, goDatabase, helper, identityXrefQualifier, intactDatabase, interproDatabase, isoformComment, isoFormParentXrefQualifier, isoformSynonym, localTransactionControl, logger, myInstitution, parsingExceptions, secondaryXrefQualifier, sgdDatabase, srsUrl, uniprotDatabase |
| Constructor Summary | |
UpdateProteins(IntactHelper helper)
|
|
UpdateProteins(IntactHelper helper,
boolean setOutput)
|
|
| Method Summary | |
void |
addNewAlias(AnnotatedObject current,
Alias alias)
add (not update) a new Xref to the given Annotated object and write it in the database. |
void |
addNewAnnotation(AnnotatedObject current,
Annotation annotation)
Add an annotation to an annotated object. |
boolean |
addNewXref(AnnotatedObject current,
Xref xref)
add (not update) a new Xref to the given Annotated object and write it in the database. |
java.lang.String |
getAnEntry(java.lang.String anUrl)
From a given URL, returns a string of a SPTR entry. |
int |
getEntryCount()
Gives the number of entry found in the given URL |
int |
getEntryProcessededCount()
Gives the number of entry successfully processed. |
int |
getEntrySkippedCount()
Gives the number of entry skipped during the process. |
java.lang.String |
getErrorFileName()
return the filename in which have been saved all Entries which gaves us processing errors. |
int |
getProteinCount()
Gives the count of all potential protein (i.e. |
int |
getProteinCreatedCount()
Gives the count of created protein |
int |
getProteinSkippedCount()
Gives the count of protein which gaves us errors during the processing. |
int |
getProteinUpdatedCount()
Gives the count of updated protein |
int |
getProteinUpToDateCount()
Gives the count of up-to-date protein (i.e. |
int |
getSpliceVariantCount()
Gives the count of all potential splice variant (i.e. |
int |
getSpliceVariantCreatedCount()
Gives the count of created splice variant |
int |
getSpliceVariantProteinCount()
|
int |
getSpliceVariantSkippedCount()
Gives the count of splice variant which gaves us errors during the processing. |
int |
getSpliceVariantUpdatedCount()
Gives the count of updated splice variant |
int |
getSpliceVariantUpToDateCount()
Gives the count of up-to-date splice variant (i.e. |
java.lang.String |
getUrl(java.lang.String uniprotAC)
From a given sptr AC, returns a full URL from where a flatfile format SPTR entry will be fetched. |
Protein |
insertSimpleProtein(java.lang.String anAc,
CvDatabase aDatabase,
java.lang.String aTaxId)
Creates a simple Protein object for entries which are not in SPTR. |
java.util.Collection |
insertSPTrProteins(java.io.InputStream inputStream,
java.lang.String taxid,
boolean update)
Inserts zero or more proteins created from SPTR entries which are retrieved from a Stream. |
java.util.Collection |
insertSPTrProteins(java.lang.String proteinAc)
Inserts zero or more proteins created from SPTR entries which are retrieved from an SPTR Accession number. |
java.util.Collection |
insertSPTrProteins(java.lang.String proteinAc,
java.lang.String taxId,
boolean update)
Inserts zero or more proteins created from SPTR entries which are retrieved from an SPTR Accession number. |
int |
insertSPTrProteinsFromURL(java.lang.String sourceUrl,
java.lang.String taxid,
boolean update)
Inserts zero or more proteins created from SPTR entries which are retrieved from a URL. |
boolean |
isLocalTransactionControl()
If true, each protein is updated in a distinct transaction. |
static void |
main(java.lang.String[] args)
D E M O Could be use for loading from a .txl file ./scripts/javaRun.sh UpdateProteins file:///homes/user/mySPTRfile.txl |
gnu.regexp.REMatch[] |
match(java.lang.String textin,
java.lang.String pattern)
from a given string and a given pattern(string), to find all matches. |
void |
setDebugOnScreen(boolean debug)
Allows to displays on the screen what's going on during the update process. |
void |
setLocalTransactionControl(boolean aLocalTransactionControl)
If true, each protein is updated in a distinct transaction. |
| Methods inherited from class uk.ac.ebi.intact.util.UpdateProteinsI |
getParsingExceptions, setLogger |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
public UpdateProteins(IntactHelper helper)
throws UpdateProteinsI.UpdateException
public UpdateProteins(IntactHelper helper,
boolean setOutput)
throws UpdateProteinsI.UpdateException
| Method Detail |
public int getProteinCreatedCount()
UpdateProteinsI
getProteinCreatedCount in class UpdateProteinsIpublic int getProteinUpdatedCount()
UpdateProteinsI
getProteinUpdatedCount in class UpdateProteinsIpublic int getProteinUpToDateCount()
UpdateProteinsI
getProteinUpToDateCount in class UpdateProteinsIpublic int getProteinCount()
UpdateProteinsI
getProteinCount in class UpdateProteinsIpublic int getProteinSkippedCount()
UpdateProteinsI
getProteinSkippedCount in class UpdateProteinsIpublic int getSpliceVariantCreatedCount()
UpdateProteinsI
getSpliceVariantCreatedCount in class UpdateProteinsIpublic int getSpliceVariantUpdatedCount()
UpdateProteinsI
getSpliceVariantUpdatedCount in class UpdateProteinsIpublic int getSpliceVariantUpToDateCount()
UpdateProteinsI
getSpliceVariantUpToDateCount in class UpdateProteinsIpublic int getSpliceVariantCount()
UpdateProteinsI
getSpliceVariantCount in class UpdateProteinsIpublic int getSpliceVariantProteinCount()
public int getSpliceVariantSkippedCount()
UpdateProteinsI
getSpliceVariantSkippedCount in class UpdateProteinsIpublic int getEntryCount()
UpdateProteinsI
getEntryCount in class UpdateProteinsIpublic int getEntryProcessededCount()
UpdateProteinsI
getEntryProcessededCount in class UpdateProteinsIpublic int getEntrySkippedCount()
UpdateProteinsI
getEntrySkippedCount in class UpdateProteinsIpublic void setDebugOnScreen(boolean debug)
UpdateProteinsI
setDebugOnScreen in class UpdateProteinsIdebug - true to enable, false to disablepublic final java.lang.String getUrl(java.lang.String uniprotAC)
UpdateProteinsI
getUrl in class UpdateProteinsIuniprotAC - a SPTR AC
public java.lang.String getAnEntry(java.lang.String anUrl)
UpdateProteinsI
getAnEntry in class UpdateProteinsIanUrl - a URL which outputs flatfile of
public gnu.regexp.REMatch[] match(java.lang.String textin,
java.lang.String pattern)
match in class UpdateProteinsItextin - A string from which some pattern will be matched.pattern - A string as a pattern.
public boolean addNewXref(AnnotatedObject current,
Xref xref)
UpdateProteinsI
addNewXref in class UpdateProteinsIcurrent - the object to which we add a new Xrefxref - the Xref to add to the AnnotatedObject
public void addNewAlias(AnnotatedObject current,
Alias alias)
UpdateProteinsI
addNewAlias in class UpdateProteinsIcurrent - the object to which we add a new Xrefalias - the Alias to add to the AnnotatedObject
public void addNewAnnotation(AnnotatedObject current,
Annotation annotation)
current - the annotated object to which we want to add an Annotation.annotation - the annotation to add the Annotated objectpublic java.util.Collection insertSPTrProteins(java.lang.String proteinAc)
UpdateProteinsI
insertSPTrProteins in class UpdateProteinsIproteinAc - SPTR Accession number of the protein to insert/update
public java.util.Collection insertSPTrProteins(java.lang.String proteinAc,
java.lang.String taxId,
boolean update)
UpdateProteinsI
insertSPTrProteins in class UpdateProteinsIproteinAc - SPTR Accession number of the protein to insert/updatetaxId - The tax id the protein should haveupdate - If true, update existing Protein objects according to the retrieved data.
else, skip existing Protein objects.
public Protein insertSimpleProtein(java.lang.String anAc,
CvDatabase aDatabase,
java.lang.String aTaxId)
throws IntactException
insertSimpleProtein in class UpdateProteinsIanAc - The primary identifier of the protein in the external database.aDatabase - The database in which the protein is listed.aTaxId - The tax id the protein should have
IntactException
public int insertSPTrProteinsFromURL(java.lang.String sourceUrl,
java.lang.String taxid,
boolean update)
UpdateProteinsI
insertSPTrProteinsFromURL in class UpdateProteinsIsourceUrl - The URL which delivers zero or more SPTR flat file formatted entries.taxid - Of all entries retrieved from sourceURL, insert only those which have this
taxid.
If taxid is empty, insert all protein objects.update - If true, update existing Protein objects according to the retrieved data.
else, skip existing Protein objects.
public java.util.Collection insertSPTrProteins(java.io.InputStream inputStream,
java.lang.String taxid,
boolean update)
UpdateProteinsI
insertSPTrProteins in class UpdateProteinsIinputStream - The straem from which YASP will read the ENtries content.taxid - Of all entries retrieved from sourceURL, insert only those which have this
taxid.
If taxid is empty, insert all protein objects.update - If true, update existing Protein objects according to the retrieved data.
else, skip existing Protein objects.
public java.lang.String getErrorFileName()
UpdateProteinsI
getErrorFileName in class UpdateProteinsIpublic boolean isLocalTransactionControl()
isLocalTransactionControl in class UpdateProteinsIpublic void setLocalTransactionControl(boolean aLocalTransactionControl)
setLocalTransactionControl in class UpdateProteinsIaLocalTransactionControl - New value for localTransactionControl
public static void main(java.lang.String[] args)
throws java.lang.Exception
java.lang.Exception
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||