webchem icon indicating copy to clipboard operation
webchem copied to clipboard

pc_sect() fails when requesting cas for cid 132971

Open stitam opened this issue 2 years ago • 2 comments

I found an example where pc_sect() fails unexpectedly.

webchem::pc_sect(132971, "cas")
#> Error in if (is.character(txt) && length(txt) == 1 && nchar(txt, type = "bytes") < : missing value where TRUE/FALSE needed

Created on 2022-03-30 by the reprex package (v2.0.1)

I checked the compound's PubChem page, the CAS registry number is there: https://pubchem.ncbi.nlm.nih.gov/compound/132971

The same query works for other compounds:

webchem::pc_sect(176, "cas")
#> # A tibble: 21 × 5
#>    CID   Name        Result      SourceName           SourceID  
#>    <chr> <chr>       <chr>       <chr>                <chr>     
#>  1 176   Acetic acid 64-19-7     CAMEO Chemicals      2272      
#>  2 176   Acetic acid 64-19-7     CAMEO Chemicals      9215      
#>  3 176   Acetic acid 64-19-7     CAMEO Chemicals      19328     
#>  4 176   Acetic acid 64-19-7     CAS Common Chemistry 64-19-7   
#>  5 176   Acetic acid 64-19-7     ChemIDplus           0000064197
#>  6 176   Acetic acid 8030-97-5   ChemIDplus           0008030975
#>  7 176   Acetic acid 68475-71-8  ChemIDplus           0068475718
#>  8 176   Acetic acid 99149-56-1  ChemIDplus           0099149561
#>  9 176   Acetic acid 119510-26-8 ChemIDplus           0119510268
#> 10 176   Acetic acid 64-19-7     DrugBank             DB03166   
#> # … with 11 more rows

Created on 2022-03-30 by the reprex package (v2.0.1)

Any ideas?

stitam avatar Mar 30 '22 13:03 stitam

I'm getting the same issue with other CIDs (so far, 8172, 132971). It seems like the error is from the fromJSON function from the jsonlite package:

https://github.com/jeroen/jsonlite/blob/80854359976250f30a86a6992c0d8c3b3d95473d/R/fromJSON.R#L87

As to why only some specific CIDs are causing an error in that line of jsonlite::fromJSON, I'm unsure.

BrianFFish avatar Jun 30 '22 05:06 BrianFFish

Many thanks @FishParade for tracking down this bug. It seems the error is caused when 1. the PubChem page does contain the requested section (http status code will be 200), but the content of the response is NA. in this case jsonlite::fromJSON() cannot resolve the conditional statement. Since pc_sect() does work with other sections of the same compound, I feel this might be a PubChem issue.

stitam avatar Jun 30 '22 07:06 stitam

So I've been examining the issue since I had a similar occurrence on my end! When I come across this type of error, I usually report it to PubChem and they're able to fix the record within a few days. Can confirm that it's usually some odd Unicode that's not being parsed gracefully by fromJSON. Easy to confirm by requesting XML vs JSON and seeing where the error is.

Long term proposed solution would be modifying fromJSON to return something when parsing a NA or NULL response, as suggested by https://github.com/jeroen/jsonlite/issues/211 and https://github.com/jeroen/jsonlite/pull/311

I can understand why an error message would be preferred but I'd also like to not have to hunt down a bad string of Unicode.

sxthimons avatar Apr 24 '23 17:04 sxthimons