gbif-api icon indicating copy to clipboard operation
gbif-api copied to clipboard

Add occurrence count from latest harvest to endpoints in /dataset

Open dshorthouse opened this issue 5 years ago • 2 comments

As in subject. This would be immensely useful for me to help prevent unnecessary generation of DwC-As of all GBIF specimen-based occurrences if I can first discover any datasets that have dropped a significant number of records since I last downloaded them. For example, it appears https://www.gbif.org/dataset/4ce8e3f9-2546-4af1-b28d-e2eadf05dfd4 has mistakenly dropped half its 4.5M occurrences from its DwC-A at some time between today and two weeks ago. I have been in touch with Niels Klazenga to see if he can get them restored. And then I'd generate yet another 65GB DwC-A file. Incidentally, I am faced with 12hr+ download times for such files. If however I create a Droplet on DigitalOcean in Amsterdam to download the file before hopping the pond, I can get it to my machine in NA in approx. 1.5hrs.

dshorthouse avatar Aug 26 '20 05:08 dshorthouse

https://api.gbif.org/v1/occurrence/count?datasetKey=4ce8e3f9-2546-4af1-b28d-e2eadf05dfd4 is the count API, is that sufficient?


The download speed issue I've transferred to https://github.com/gbif/portal-feedback/issues/2963

MattBlissett avatar Aug 26 '20 07:08 MattBlissett

@MattBlissett That's functional, but not particularly efficient. In my case, that would result in 5000+ GET requests. And, thanks for transferring the speed issue.

dshorthouse avatar Aug 27 '20 11:08 dshorthouse