Data-Rescue-PDX
Data-Rescue-PDX copied to clipboard
CDC1623F-7C1A-438B-80F3-F147659BB31C
{
"title": "Aggregated Computational Toxicology Online Resource",
"notes": "This resource is a link to the ACToR (Aggregated Computational Toxicology Online Resource) website by EPA, where data is aggregated from thousands of public sources on over 500,000 chemicals.The data is available for download via the Download link: https://actor.epa.gov/actor/download.xhtml. Details on the licensing information is available at https://edg.epa.gov/EPA_Data_License.html",
"license_id": "public domain",
"landingPage": "https://actor.epa.gov/actor/home.xhtml",
"id": "CDC1623F-7C1A-438B-80F3-F147659BB31C",
"isPartOf": "Environmental Protection Agency",
"tags": "EPA",
"organization": {
"description": "EPA’s Aggregated Computational Toxicology Online Resource (ACToR) aggregates data from thousands of public sources on over 500,000 chemicals. It is searchable by chemical name and other identifiers. ACToR is also the data and web applications warehouse for EPA’s computational toxicology information which includes high-throughput screening, chemical exposure, sustainable chemistry (chemical structures and physicochemical properties) and virtual tissues data.",
"title": "Environmental Protection Agency",
"name": "EPA",
"is_organization": true,
"image_url": "",
"type": "organization",
"id": "epa-gov"
}
}
I followed the link https://actor.epa.gov/actor/download.xhtml which is a page that has a link to the actual data discussed in the landing page:
https://actor.epa.gov/actor/archive/v8/actor_2015q3.sql.gz
So I would add a resources tag to the end of the JSON:
"resources": [{"url": "https://actor.epa.gov/actor/archive/v8/actor_2015q3.sql.gz"}]
There is no need to do any scraping for the data, because there is just that one data file.