webchem icon indicating copy to clipboard operation
webchem copied to clipboard

Convert CAS to SMILES

Open pstahlhofen opened this issue 3 years ago • 8 comments

Dear webchem developers, I am aware that converting CAS to SMILES is usually not that complicated. However, under the current circumstances I've spended quite some time on this problem without success so far, so I thought you might be able to help me out.

What I have tried

  1. Converting CAS to SMILES using the cts_convert function. This didn't return a result for any CAS I tried, so I visited the CTS Proxy showing me "Error 500 When calling /rest/to values"
  2. Converting CAS to SMILES using ChemSpider i.e. the cs_convert function. At this ChemSpider Web Page I was able to turn a single CAS number into a molecule description including a SMILES code. I signed in to ChemSpider and created an API key to automate the process for multiple CAS numbers. However, the method cs_convert refused to accept the argument from="CAS", yielding Error in match.arg(from, choices=valid) : 'arg' should be one of "csid", "inchikey", "inchi", "smiles", "mol". If ChemSpider generally supports conversion of CAS registry numbers, it would be a nice feature to extend the cs_convert method to perform this conversion as well.
  3. I was able to convert CAS to SMILES using CACTUS, but I couldn't find support for this API in webchem (or any other R package) and I would rather not rely on shell scripts as they are brittle and highly platform dependent. Did I probably overlook some existing support for CACTUS?
  4. I tried the ci_query function to retrieve the SMILES code, which ran into Service not available. Returning NA

Any help is very appreciated

pstahlhofen avatar Nov 10 '21 09:11 pstahlhofen

Hi @pstahlhofen, thanks for raising this issue.

  1. cts_convert() should be able to convert cas to smiles, yet id doesn't and other examples seem to be failing as well, will look into it, thanks for flagging.
  2. While ChemSpider website supports many things, the APIs are more limited and last time I checked it did not offer conversions from/to CAS.
  3. CACTUS is supported through cir_query() and it seems to work! Example with ethanol: cir_query("64-17-5", from = "cas", to = "smiles")
  4. Example with ethanol works on my end: ci_query("64-17-5", from = "rn") returns a list, and using sapply(<list>, function(x) x$smiles) returns the smiles for the compound.

You can also use pubchem to get smiles from cas. Again for ethanol, get_cid("64-17-5", from = "xref/rn") returns the CID of the compound, and then pc_sect(702, "canonical smiles") returns the section "canonical smiles" from ethanol's pubchem page , https://pubchem.ncbi.nlm.nih.gov/compound/702#section=Canonical-SMILES

Let me know if these answer your question.

Also I'll keep this issue open until cts_convert() is resolved.

stitam avatar Nov 10 '21 10:11 stitam

Hi @stitam, thanks for the quick answer! cir_query solved my problem :) See below for details

  1. Hmm, the HTTP-Status is OK but the strings in the result always seem to be empty.
  2. Alright
  3. cir_query works great!
  4. Aha, ci_query works with from="rn" but not with from="cas". Thanks for the example. If this is permanent, you might want to update the documentation on ci_query, where it says that cas is also supported.

get_cid("64-17-5", from = "xref/rn") ran into Service not available, so did get_cid("64-17-5", from = "xref/RN") which is provided as an example in the docs.

pstahlhofen avatar Nov 10 '21 11:11 pstahlhofen

It looks like CTS is down completely right now. Looks like someone has already opened an issue: https://bitbucket.org/fiehnlab/ctsproxy/issues/38/error-500

Looks like they haven't closed any issues in quite some time.

Aariq avatar Nov 10 '21 20:11 Aariq

Related? https://github.com/ropensci/webchem/issues/257

Aariq avatar Nov 10 '21 20:11 Aariq

Yes, I think so

pstahlhofen avatar Nov 11 '21 08:11 pstahlhofen

To clarify, if I remember correctly cts_convert() doesn't currently use CTS's REST API, because it was broken for some time. cts_convert() uses a more web-scraping type approach, but #257 was a reminder to switch to using the REST API if it ever started working again. (Edit: I just checked and it's still broken over a year later because of an expired SSL certificate)

CTS has had a lot of issues in the past, probably because of all the API dependencies it has, and it might be worthwhile contacting someone at the Fiehn Lab to get an idea of their long-term goals for the project before putting any effort into changing/fixing cts_convert(). If the Fiehn Lab isn't planning on maintaining CTS long term (e.g. because they don't have funding or staff), then it's maybe time to consider cts_convert() soft deprecated / superseded.

Aariq avatar Nov 11 '21 14:11 Aariq

Thanks @Aariq, that is correct, CTS REST API is not yet implemented in webchem. I contacted them last time the service was down, I'll contact them again, ask about their long-term goals and then we can decide..

stitam avatar Nov 12 '21 15:11 stitam

Hi All,

Update on this issue: the service is back online, but queries are still not working as they used to.

This one works:

webchem::cts_convert("3380-34-5", "cas", "inchikey")
#> $`3380-34-5`
#> [1] "XEFQLINVKFYRCS-UHFFFAOYSA-N" "ZRWRPGGXCSSBAO-UHFFFAOYSA-N"

Created on 2021-11-25 by the reprex package (v2.0.1)

This one doesn't:

webchem::cts_convert("triclosan", "chemical name", "inchikey")
#> $triclosan
#> [1] NA

Created on 2021-11-25 by the reprex package (v2.0.1)

stitam avatar Nov 25 '21 09:11 stitam