pyesridump icon indicating copy to clipboard operation
pyesridump copied to clipboard

Sources that support returnIdsOnly but not returnCountOnly

Open albarrentine opened this issue 8 years ago • 1 comments

For the Wuhan, China data source (seem to be getting all the legacy servers lately), pyesridump was only pulling 1001 records when there were 3429 in the full data set.

From this line: https://github.com/openaddresses/pyesridump/blob/5823178539b6b1571c74663eaf83890e0c156eab/esridump/dumper.py#L244, it looks like if the source doesn't support returnCountOnly, the bounding box is recursively subdivided into four quadrants (Quadtree-style) with a stopping condition when there are < maxRecords in a given quadrant.

  1. This should generally retrieve everything, except for the following test in _scrape_an_envelope:
    if len(features) == max_records:
    
    It appears the Wuhan source returns 1001 records where max_records is 1000, which executes the same code as if it had returned 999 results i.e. assumes the base case has been met and returns early. This could be fixed by changing the conditional to:
    if len(features) >= max_records
    
  2. With the new OID enumeration from #33, it might make sense to use the quadrant-based method as a fallback only if the source supports neither returnCountOnly nor returnIdsOnly. Otherwise OID enumeration should be fewer queries. Does that make sense or are there some other edge cases to consider?

albarrentine avatar Feb 23 '17 20:02 albarrentine

@thatdatabaseguy can you make a PR for the change you mention in (1) above? That seems like a useful thing.

iandees avatar Feb 28 '17 02:02 iandees