arcgis-python-api
arcgis-python-api copied to clipboard
Feature layer query introduces duplicates when querying above 2000 records
Since the query refactor in 2.3.0 the query method does not correctly request all features when the number of features exceed the maxRecord limit of the feature service.
A query that should result in 18k features results in 20k features with 3~k duplicates. The same query on 2.2.X < has no issues.
The issue was found when querying a feature service with a maxRecord set at 5000, with a query returning more than 15k features.
I found some discrepancies between the old and new code.
- Default maxRecord count instead of service property
When the query exceeds the transfer limit, it will instate a
resultRecordCountof 2000 and anresultOffsetof the same amount.
if "resultRecordCount" not in params:
# assign initial value after first query
params["resultRecordCount"] = 2000
if "resultOffset" in params:
# add the number we found to the offset so we don't have doubles
params["resultOffset"] = params["resultOffset"] + len(
result["features"]
)
else:
# initial offset after first query (result record count set by user or up above)
params["resultOffset"] = params["resultRecordCount"]
This may be a default value, but when the default return amount on the Feature Service is higher this will result in a faulty second query with 2000 returned records and a 2000 offset, while already 5000 features had been returned in the first query.
- First query may or may not be ordered.
The second problem arises from the fact that the features returned in the first query (at the top of the query method) are not ordered. However the following query results using the
resultRecordCountandresultOffsetare ordered. Which means that these results may or may not contain features that have already been returned in the very first query. Before the refactor this wasn't an issue because the code checked if pagination was needed before performing the first query.
def _query(layer, url, params, raw=False):
"""returns results of query"""
result = {}
try:
# Layer query call
result = layer._con.post(url, params, token=layer._token) # This one is not ordered?
# Figure out what to return
if "error" in result:
raise ValueError(result)
elif "returnCountOnly" in params and _is_true(params["returnCountOnly"]):
# returns an int
return result["count"]
elif "returnIdsOnly" in params and _is_true(params["returnIdsOnly"]):
# returns a dict with keys: 'objectIdFieldName' and 'objectIds'
return result
elif "returnExtentOnly" in params and _is_true(params["returnExtentOnly"]):
# returns extent dictionary with key: 'extent'
return result
elif _is_true(raw):
return result
elif "resultRecordCount" in params and params["resultRecordCount"] == len(
result["features"]
):
return arcgis_features.FeatureSet.from_dict(result)
else:
# we have features to return
features = result["features"]
# If none of the ifs above worked then keep going to find more features
# Make sure we have all features
if "exceededTransferLimit" in result:
while (
"exceededTransferLimit" in result
and result["exceededTransferLimit"] == True
):
if "resultRecordCount" not in params:
# assign initial value after first query
params["resultRecordCount"] = 2000
if "resultOffset" in params:
# add the number we found to the offset so we don't have doubles
params["resultOffset"] = params["resultOffset"] + len(
result["features"]
)
else:
# initial offset after first query (result record count set by user or up above)
params["resultOffset"] = params["resultRecordCount"]
result = layer._con.post(path=url, postdata=params, token=layer._token) # These queries are ordered?
# add new features to the list
features = features + result["features"]
# assign complete list
result["features"] = features
I use a workaround for these issues by:
-
forcing an ordering on all query so the first query will also have forced ordering. Changing the code to check if pagination is needed before performing the feature queries would also fix this (like before 2.3.0)
order_by_fields="OBJECTID ASC" -
To make queries with the correct number of features, this part in the query method
params["resultRecordCount"] = 2000is replaced byparams["resultRecordCount"] = len(features)where the length of the returned features from the first query is set as the maxRecord amount that the first query has reached. This might as well be a value read from the service properties like before.