esgf-pyclient
esgf-pyclient copied to clipboard
Unexpected number of results for large query
I am exploring to use esgf-pyclient to get a list of all retracted CMIP6 datasets (for our automated maintenance of Pangeo CMIP6 cloud data.
I am trying the following:
from pyesgf.search import SearchConnection
conn = SearchConnection(
'https://esgf-node.llnl.gov/esg-search',
distrib=True,
)
ctx = conn.new_context(mip_era='CMIP6', retracted=True, replica=False, fields='id', facets=['doi'])
ctx.hit_count
And I get back a hit count of 691984
But when I try to extract a list of instance_ids
results = ctx.search(batch_size=10000)
retracted = [ds.dataset_id for ds in results]
len(retracted)
The list only has 240000
elements. That very even number makes me think that there is some internal limit I am hitting here?
Or did I miss something in the above code?
Any help on this would be greatly appreciated.