server icon indicating copy to clipboard operation
server copied to clipboard

Mismatch between @iot.count and @iot.nextLink when using $expand

Open justb4 opened this issue 6 years ago • 7 comments

Found on v0.5 Docker version. See also https://github.com/Geonovum/smartemission/issues/79#issuecomment-364366066 and further comments there.

We have 182 Things in our GOST STA server. When requesting these directly or with expanding only Locations we get 100 Things with first request and 82 using the nextLink in the first response. e.g.

  • http://data.smartemission.nl/gost/v1.0/Things?$expand=Locations and then
  • http://data.smartemission.nl/gost/v1.0/Things?$expand=Locations&$top=100&$skip=100

But when we also or only expand Datastreams only 13 Things are returned (each Thing has about 8 Datastreams so 13*8 amounts for about 100) and the nextLink returns again 13 Things but without a nextLink, so it looks we fetched all Things.

  • http://data.smartemission.nl/gost/v1.0/Things?$expand=Locations,Datastreams and then
  • http://data.smartemission.nl/gost/v1.0/Things?$expand=Locations,Datastreams&$top=100&$skip=100 (only 13 Things no nextLink)

Explicitly increasing $skip with 100 will eventually return all Things with some overlap:

  • http://data.smartemission.nl/gost/v1.0/Things?$expand=Datastreams,Locations&$top=100
  • http://data.smartemission.nl/gost/v1.0/Things?$expand=Datastreams,Locations&$top=100&$skip=100
  • http://data.smartemission.nl/gost/v1.0/Things?$expand=Datastreams,Locations&$top=100&$skip=200
  • http://data.smartemission.nl/gost/v1.0/Things?$expand=Datastreams,Locations&$top=100&$skip=300 . .
  • http://data.smartemission.nl/gost/v1.0/Things?$expand=Datastreams,Locations&$top=100&$skip=1700 (empty)

According to the standard The count annotation represents the number of entities in the collection. : http://docs.opengeospatial.org/is/15-078r6/15-078r6.html#37. So maybe @iot.count should be the total number of Entities, including Locations and Datastreams? But just the count of Things (182) seems to make more sense.

justb4 avatar Feb 09 '18 11:02 justb4

I can reproduce this bug on 0.5 and latest, I probably introduced this bug somewhere when changing the queries. Working on it..

tebben avatar Feb 09 '18 14:02 tebben

Returning too few results per page when using $expand should be fixed with commit https://github.com/gost/server/commit/9fc9764c6a6229b7603a969afe50de8bc22860b6

In the above mentioned commit there was also a fix for a sometimes incorrect result for the request $count=true. The @iot.count field contains the total count of (main requested) entities which suit the request/query, according to the standard: "The $count system query option with a value of true specifies that the total count of items within a collection matching the request SHALL be returned along with the result"

tebben avatar Feb 28 '18 11:02 tebben

Thanks! Rolled out latest Docker Image including commit 9fc9764 on test server. Confirmed that simple $expand queries work, and return a valid @iot.count and @iot.nextLink like the case above:

http://test.smartemission.nl/gost/v1.0/Things?$expand=Locations,Datastreams&$count=true and nextLink: http://test.smartemission.nl/gost/v1.0/Things?$count=true&$expand=Locations,Datastreams&$top=100&$skip=100

This will indeed return all about 190 Things with their DataStreams currently within GOST DB stored.

More complex queries like the last Observations from all Things as indicated in #145 though still give a few Things (about 14, as not all Things have the same amount of Datastreams) back without nextLink and no count and sometimes a Bad request (but that could have other local reasons):

http://test.smartemission.nl/gost/v1.0/Things?$count=true&$expand=Locations($select=location),Datastreams($select=id,name),Datastreams/Observations($select=id,phenomenonTime,result;$top=1)&$select=id,name,properties

Can imagine such queries are complex to implement and that the STA standard is not always clear. But I think we have progress on the Smart Emission issues for SOS to STA transition. Keep up the good work!

For cross-issue-ref: https://github.com/Geonovum/smartemission/issues/79 https://github.com/Geonovum/smartemission/issues/90

justb4 avatar Feb 28 '18 19:02 justb4

About the 2nd request: Some time ago when we worked on 9.3.3.5 and example request 4 as test, we changed the left joins to inner joins for expands to return the correct response. I think you don't get all expected Things because some Datastreams don't hold any Observations, is this correct?

I checked and found out we interpreted some things wrong, some scenarios with their expected return:

  1. Things?$expand=Datastreams/Observations Return all Things and expand the related Datastreams with Observations if exist

  2. Things?$expand=Datastreams/Observations($filter=result gt 10) Return all Things and expand the related Datastreams and Observations which have a result of greater than 10

  3. Things?$expand=Datastreams/Observations&$filter=Datastreams/Observations/result gt 10 Only return Things (with expanded Datastreams and Observations) where a Datastream holds observations with a result of 10 or greater

Going to work on this now.

tebben avatar Mar 01 '18 09:03 tebben

The different scenario's now give the expected result: https://github.com/gost/server/commit/11ca2c79223dddc4b00ba0463bf8d4e8243998b3

tebben avatar Mar 05 '18 12:03 tebben

Ok, tried 11ca2c7 on SE testserver. Problem we are facing is that there are nearly 6 million Observations and that queries involving Things?$expand=Datastreams/Observations, or other detailed queries related to Observations, like getting the last with $top=1, take forever until proxy timeout. I will test on smaller dataset for this issue.

justb4 avatar Mar 06 '18 15:03 justb4

Created a new issue for performance problems: https://github.com/gost/server/issues/149 Can you test on a smaller set and close issue #146 if fixed?

tebben avatar Mar 07 '18 12:03 tebben