webchem
webchem copied to clipboard
nist
http://webbook.nist.gov/chemistry/cas-ser.html
Related https://github.com/ropensci/webchem/pull/154
-
Feasibility. There is no API. Scraping not explicitly disallowed, although some features (e.g. mass spectra) are available in paid software I belive, so we might not want to scrape everything. Most data is presented in table form making scraping somewhat easy.
-
Scope. There is a ton of information, but most is experimental chemical properties with citations. Not all datasets exist for all compounds. Examples of properties:
- thermochemistry data
- phase change data
- reaction thermochemistry (reaction search allows search for reactants and products together)
- fluid properties
- identifiers
- synonyms
- mol file for structure
- IR spectra
-
Overlap. Certainly provides at least some properties not found in other databases in webchem. My suggestion would be to treat each property type as an individual database and not try to integrate more than one feature at a time.
nist_ri()
is already implemented. The reaction search is probably the most unique thing, so that might be something to work on next?
From the traceability perspective it's great that they have references to original publications. Ideally every single number in every database should be traceable to the original publication.
I agree to treat each property separately, we can always create an intergrator function later to reduce the number of exported functions. Reactions (similarly to QSAR models discussed earlier) open up the scope of the package quite a bit, I am ok with it, @andschar what do you think?
Given that there is no API, I think it would be best to ask for explicit approval to be safe. I will contact NIST about this.