chembl_webresource_client icon indicating copy to clipboard operation
chembl_webresource_client copied to clipboard

Extract ChEMBL version associated per compound

Open czodrowskilab opened this issue 5 years ago • 7 comments

To whom it may concern,

is there a way to extract something like a publication/experimental date associated with a ChEMBL compound?

For benchmarking studies, we would like to use time-split validation. Therefore, it would be of tremendous help for us to get an association between ChEMBL version (or publication/experimental date) and a ChEMBL compound?

Best regards, Paul

czodrowskilab avatar Apr 09 '19 13:04 czodrowskilab

Hi!

Thanks for the chembl_webresource_client - we are using it at @volkamerlab a lot!

My question is related to @czodrowskilab's question here (I think), so I'll add it here.

Is there a field for the date when

  • a compound or
  • a bioactivity

was added to ChEMBL?

I know of the document_year field, which I can fetch for bioactivities:

bioactivities = bioactivities_api.filter(
    target_chembl_id='CHEMBL941').only(
        'activity_id',  
        'document_year'
     )
)

Given the ChEMBL database scheme, I guess it works as follows: From doc_id (field) in activities (table) go to doc_id (field) in docs (table) and extract year (field). This probably also works similarly for compounds.

However, the document's year

  • does not necessarily did to equal the bioactivity or compound deposition date in ChEMBL and
  • does not need to be the same for compounds and bioactivities, right?

Thus, back to my question: What is the best way to filter compound or bioactivity entries in ChEMBL by their deposition date?

Thank you for your time.

dominiquesydow avatar Apr 29 '20 15:04 dominiquesydow

I am realizing: The deposition date probably equal the ChEMBL version in which the compound/bioactivity was added.

Is there a way to access the version? In the ChEMBL scheme, the version table seems not to be connected with any other table: https://www.ebi.ac.uk/chembl/db_schema

dominiquesydow avatar Apr 30 '20 13:04 dominiquesydow

Update on this matter: The ChEMBL support team (https://www.ebi.ac.uk/support/) let me know that it is currently not possible to extract data from the web services (or the interface) for a previous ChEMBL release.

dominiquesydow avatar Jun 02 '20 08:06 dominiquesydow

Hi, sorry for not having replied this before. As Dominique says, this is currently not possible but we are working towards supporting it on future ChEMBL versions.

eloyfelix avatar Aug 18 '20 12:08 eloyfelix

Dear ChEMBL team,

Thank you for chembl_webresource_client. Is there any update on whether the version could be directly/indirectly linked to an activity record?

Like @czodrowskilab, I am also interested in splitting the activity data for a target by chembl version in which it first appeared (to mimic time-split).

Thanks, Vishal

iwwwish avatar Mar 21 '23 23:03 iwwwish

We are working on implementing time stamps for deposited data sets. We are exploring how to deal with updates of documents now. Tentatively, we'll release this with CHEMBL 34.

BZdrazil avatar May 17 '23 09:05 BZdrazil

However, time stamps for data sets are already available ( and have always been) via the VERSION table which includes CREATION_DATE for every ChEMBL release. By querying for which release a document has been added, you'll get your time stamps. After we've solved the question about updated documents, we plan to make that information more easily accessible, likely via a new table.

BZdrazil avatar May 17 '23 09:05 BZdrazil