webchem
webchem copied to clipboard
pc_sect() fails when requesting cas for cid 132971
I found an example where pc_sect()
fails unexpectedly.
webchem::pc_sect(132971, "cas")
#> Error in if (is.character(txt) && length(txt) == 1 && nchar(txt, type = "bytes") < : missing value where TRUE/FALSE needed
Created on 2022-03-30 by the reprex package (v2.0.1)
I checked the compound's PubChem page, the CAS registry number is there: https://pubchem.ncbi.nlm.nih.gov/compound/132971
The same query works for other compounds:
webchem::pc_sect(176, "cas")
#> # A tibble: 21 × 5
#> CID Name Result SourceName SourceID
#> <chr> <chr> <chr> <chr> <chr>
#> 1 176 Acetic acid 64-19-7 CAMEO Chemicals 2272
#> 2 176 Acetic acid 64-19-7 CAMEO Chemicals 9215
#> 3 176 Acetic acid 64-19-7 CAMEO Chemicals 19328
#> 4 176 Acetic acid 64-19-7 CAS Common Chemistry 64-19-7
#> 5 176 Acetic acid 64-19-7 ChemIDplus 0000064197
#> 6 176 Acetic acid 8030-97-5 ChemIDplus 0008030975
#> 7 176 Acetic acid 68475-71-8 ChemIDplus 0068475718
#> 8 176 Acetic acid 99149-56-1 ChemIDplus 0099149561
#> 9 176 Acetic acid 119510-26-8 ChemIDplus 0119510268
#> 10 176 Acetic acid 64-19-7 DrugBank DB03166
#> # … with 11 more rows
Created on 2022-03-30 by the reprex package (v2.0.1)
Any ideas?
I'm getting the same issue with other CIDs (so far, 8172, 132971). It seems like the error is from the fromJSON function from the jsonlite package:
https://github.com/jeroen/jsonlite/blob/80854359976250f30a86a6992c0d8c3b3d95473d/R/fromJSON.R#L87
As to why only some specific CIDs are causing an error in that line of jsonlite::fromJSON, I'm unsure.
Many thanks @FishParade for tracking down this bug. It seems the error is caused when 1. the PubChem page does contain the requested section (http status code will be 200), but the content of the response is NA. in this case jsonlite::fromJSON()
cannot resolve the conditional statement. Since pc_sect()
does work with other sections of the same compound, I feel this might be a PubChem issue.
So I've been examining the issue since I had a similar occurrence on my end! When I come across this type of error, I usually report it to PubChem and they're able to fix the record within a few days. Can confirm that it's usually some odd Unicode that's not being parsed gracefully by fromJSON
. Easy to confirm by requesting XML vs JSON and seeing where the error is.
Long term proposed solution would be modifying fromJSON
to return something when parsing a NA
or NULL
response, as suggested by https://github.com/jeroen/jsonlite/issues/211 and https://github.com/jeroen/jsonlite/pull/311
I can understand why an error message would be preferred but I'd also like to not have to hunt down a bad string of Unicode.