webchem
webchem copied to clipboard
ECHA
e.g. http://apps.echa.europa.eu/registered/data/dossiers/DISS-9daa7594-c409-0ed0-e044-00144f67d249/DISS-9daa7594-c409-0ed0-e044-00144f67d249_DISS-9daa7594-c409-0ed0-e044-00144f67d249.html
possible conflict with terms of use....
A colleague informed me about a broken link in a script yesterday: looks like ECHA is currently implementing a Web API (this sparks joy!!!). They told me last year that this is a high priority for them, so I would expect more to come soon... Hopefully the terms of use regarding data retrieval will be also clearer then.
In any case, I'm very interested in any developments to implement access to ECHA data and might try to commit something in the future. But it's probably best to wait some more time right now.
European Chemicals Agency: https://echa.europa.eu/
note user must accept terms and conditions before using search. If we implement this database, we should probably require users to go to read this too.
Feasibility
There is an advanced search, but I don't see evidence of an API for search.
Scope
Focus is on regulatory information, but includes chemical properties as well.
Overlap
Quite a bit of unique stuff like biodegredation and bioaccumulation
The legal notice which one has to accept before query states:
"Systematic automated data collection activities (including scraping, data mining, and extraction and re-utilisation) of the whole or a substantial part of the ECHA website and the ECHA databases are prohibited."
Maybe the part we need is in fact in IUCLID? IUCLID has a REST API: https://iuclid6.echa.europa.eu/public-api https://en.wikipedia.org/wiki/IUCLID
I will try to contact them (quite difficult to find an e-mail address) and ask for an update on their Web API.
I haven't heard anything regarding implementation of an API since. I guess that was a false alarm back then...but I'm still looking up data from the ECHA database for single substances nearly daily for work (no scraping) and believe I can give some helpful ideas here. I'm sure the ECHA database would be an important source for many people. like @Aariq said, there is a lot of unique stuff. Further, for those substances which are in the database (essentially all produced or imported chemicals in the EU > 1 ton per year), the data is fairly complete because companies are required by law to provide all available information (at least that's the theory ;).
The data in the ECHA database can be accessed without scraping by other means though, however all of these sources have in common that some of the meta data is lacking:
-
The data is available as downloadable IUCLID files which are nothing else but zipped XML files which follow (largely or exactly?) the XML schema of the OECD harmonized study templates (https://www.oecd.org/ehs/templates/harmonised-templates-health-effects.htm). In the past this data got updated only every few years, but the last update is very recent, so this is a good source right now, but not suitable for webchem. Using the IUCLID REST API would require a publically available server running IUCLID (with the data loaded), which very most likely doesn't exist.
-
the data is available in eChemPortal. At least to a large extent. There is an issue open for eChemPortal (#6 ), but I believe that implementing eChemPortal in webchem will be very tedious if it is even possible. They revamped the website lately, but building queries is still a PITA.
-
EPA Comptox dashboard. Alos has an open issue #131 . The CompTox dashboard aggregates data from several sources, and the ECHA database is also there. I believe there is no API? but looks scraping friendly.
Thanks @marcodilger for the detailed update! ECHA is a valuable data source and it would be great to access it through webchem! I have asked the Agency for info about the API, let's see what they respond. I am a little uncomfortable with implementing ECHA functions in webchem because from there it really only depends on the user to respect the legal notice. As a workaround, we could construct the function in such a way that it requires user interaction before querying each compound. This way the data collection would no longer be systematic and automated. I feel this would comply with both the wording and the intent behind the legal notice, but I am still not sure.
Hello. Were you finally able to find a way to access echa data via API or any other means ?
Hi @pintaf, I don't know of any public APIs and it seems their legal notice still prohibits automated data collection activities.
I wrote directly to ECHA C&L for the possibility of using their data. They replied that yes a request could be made to obtain the files but they made this clear to me: Classification and labeling information for notified and registered substances is received from manufacturers and importers. Using the data without obtaining prior permission from the rights holders (i.e. manufacturers and importers) may violate their rights.
But is it apply if we got the GHS from other sources as PubChem?
Hi @pamonian, webchem only provides programmatic access to publicly available chemical data sources, I am definitely not qualified to give you legal advice on how you can use the data.
ECHA wrote "may" violate which suggests whether you violate manufacturer/importer rights or not depends on how you use their data. ECHA gave you the warning so they covered themselves before sharing the data, the rest is your problem.
Regarding PubChem restrictions, we found a few relevant pages and added them to the help pages for most PubChem functions, e.g. ?get_cid(). These pages for example might be relevant for you:
- https://www.nlm.nih.gov/databases/download.html
- https://www.ncbi.nlm.nih.gov/home/about/policies/.
I personally use PubChem for research purposes, I'd say unless you plan to sell the data you're probably more or less free to do whatever you want with it. Please note I won't take any responsibility for sharing my view on this issue, and if you're in doubt, please consult a legal expert. Hope this was hepful :)