pygbif icon indicating copy to clipboard operation
pygbif copied to clipboard

Implements pagination

Open niconoe opened this issue 5 years ago • 4 comments

Hi @sckott!

For a current project I'm working on with @damianooldoni, we'll need to access a long list of results from the name_usage() API call.

Instead of implementing the pagination/looping is our client code (like in this quick&dirty example: https://gist.github.com/niconoe/b9dcb6c468b996b6f77e18f51516e840), we were wondering if you'd be interested in receiving a PR to implement it in pygbif itself. That would be similar to what Damiano did (for rgbif) in https://github.com/ropensci/rgbif/pull/291 and https://github.com/ropensci/rgbif/pull/295.

The plan would be to:

  • Make the mechanism generic so it can be used not only for name_usage() but also other functions that deal with paginated results from the GBIF API.
  • Either add new functions that wrap the existing ones and add pagination (all_name_usages() for example) or change the existing functions. To avoid breaking the API, we could add an optional parameter (that default to False) to tell pygbif to handle the pagination. So for example: name_usage() and name_usage(handle_pagination=False) would keep the existing behaviour, but name_usage(handle_pagination=True) would take care of the pagination and return all results. I have a slight preference for the first option (new functions) because I find the API clearer, but it's up to you!

Just tell us what you think, if you're interested we hope to start working on a PR soon!

niconoe avatar May 19 '20 13:05 niconoe

👋 thanks for this. Adding pagination handling sounds good.

I lean towards adding the functionality to the existing methods as I don't love the idea of adding a bunch of new methods - if we went with new methods, i imagine we'd have to add a new method for every current method?


p.s. pygbif is using https://github.com/psf/black formatter now - so make sure you use that before pushing changes up - there's lots of text editor integrations and a command line tool, etc.

sckott avatar May 19 '20 15:05 sckott

Any advancement concerning the limit of 300 records using occ.search?

Becheler avatar Jan 27 '22 23:01 Becheler

No - note that this library is now maintained by the GBIF team - hopefully they'll chime in here to indicate if that's something they're interested in or not

sckott avatar Jan 28 '22 00:01 sckott

Ok. I came up with a bit of code that solved my problem: not very clean, but functional enough!

def paginated_search(max_limit, *args, **kwargs):
    """ In its current version, pygbif can not search more than 300 occurences at once: this solves a bit of the problem
    """
    MAX_LIMIT = max_limit
    PER_PAGE = 100
    results = []

    from pygbif import occurrences

    if(MAX_LIMIT <= PER_PAGE):
        resp = occurrences.search(*args, **kwargs, limit=MAX_LIMIT)
        results = resp['results']
    else :
        from tqdm import tqdm
        progress_bar = tqdm(total=MAX_LIMIT, unit='B', unit_scale=True, unit_divisor=1024)
        offset = 0
        while offset < MAX_LIMIT:
            resp = occurrences.search(*args, **kwargs, limit=PER_PAGE, offset=offset)
            results = results + resp['results']
            progress_bar.update(len(resp['results']))
            if resp['endOfRecords']:
                progress_bar.close()
                break
            else:
                offset = offset + PER_PAGE
        progress_bar.close()
    return results # list of dicts

Becheler avatar Jan 28 '22 15:01 Becheler