trefle-api icon indicating copy to clipboard operation
trefle-api copied to clipboard

Downloadable archive of Trefle database

Open SebastianKG opened this issue 4 years ago • 1 comments

Is your feature request related to a problem? Please describe. This may be a stretch as a "feature", but I'm still looking for a way to get at the underlying dataset as a whole, without being rate-limited. I have a long-running crawler for the API and I have slowly collected a lot of it, but it remains incomplete and probably always will (when crawling lawfully with only one API token, the limit is quite strict). Back in August, we discussed a data dump (here: https://github.com/treflehq/trefle-api/issues/44), and the following was said:

We will soon provide an archive of our database for you to download, and thus avoid iterating on all the plants.

Describe the solution you'd like I'm sure the project is strapped for developer time and this may not be a priority, but I would love to build and publicize some cool Apache-Spark-aggregated high-level uses for this data. To enable projects like this, a data dump (or a much more lenient page size limit, which would be more expensive for the project, I expect) seems necessary.

SebastianKG avatar Dec 22 '20 21:12 SebastianKG

Not the most up to date, but this maybe of help: https://github.com/treflehq/dump

itsezc avatar Jan 10 '21 15:01 itsezc