Geospatial Search behavior
entrepot.recherche.data.gouv.fr team
What steps does it take to reproduce the issue?
Populate a dataset with geospatial data (Geographic Bounding Box)
- When does this issue occur?
When we search for this dataset with the search API and with the geo_point and geo_radius parameters
- Which page(s) does it occurs on?
The search API result page
- What happens?
For example for a Geographic Bounding Box = westLongitude "-2.258631" eastLongitude "-2.392748" northLongitude "47.518038" southLongitude "47.496346"
Link for details : https://linestrings.com/bbox/#-2.258631,47.496346,-2.392748,47.518038 Example in dataverse demo : https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/UYLYFK
and a search from Paris (about 400km): geo_point=48.872895,2.354527
Search results do not reflect reality: https://demo.dataverse.org/api/search?q=&geo_point=48.872895,2.354527&geo_radius=400 => result found (OK) https://demo.dataverse.org/api/search?q=&geo_point=48.872895,2.354527&geo_radius=150 => result found (KO) https://demo.dataverse.org/api/search?q=*&geo_point=48.872895,2.354527&geo_radius=50 => result not found (OK)
- To whom does it occur (all users, curators, superusers)?
all users
- What did you expect to happen?
A more precise search result, depending on the geo_point and the geo_radius given as query parameters.
Which version of Dataverse are you using?
5.13, 5.14
Any related open or closed issues to this bug report?
Google Group topic : https://groups.google.com/g/dataverse-community/c/0NynPGQAnE0
2024/03/14
- Sized at 10 for investigation, please resize based upon results
- Also, note that the API has changed
Hi @stevenferey, I am trying to test the example that you gave us but it seems the Geographic bounds on your post are not correct.
- What version of Dataverse are you using?
- Could you please let me know if you were able to input this data on the system and was not caught by the validation?
Best, Juan
Hi @stevenferey, I was talking a bit with the team, and from what I see entrepot.recherche.data.gouv.fr is using Dataverse 5.14. and I have a couple of things.
On Dataverse 6.1 we added the validation to this field as you can see in the picture that I posted, this was not on 5.14 so there is a possibility that the data that you posted (which, is invalid data) was introduced on the database. I would suggest fixing this data, and then re-index to check if this solves the search problem.
We also rename some of these fields recently and * northLongitude* and southLongitude doesn't exist anymore, they were renamed to northLatitude and southLatitude as they should be.
Best, Juan
My guess would be that in earlier versions, this box was indexed as the strip around the whole Earth, excluding the small east-west region intended (as in the image). That could explain why there was a hit at 150K - the box extended directly south of Paris (and wasn't ~400K west). Flipping the east/west coords should give the expected results. If not, this is still an issue. @stevenferey - can you check and close/update this issue as appropriate?
Hello, Thank you for your feedback, I tested and the results are OK with the inverted values :
/api/search?q=*&geo_point=48.872895,2.354527&geo_radius=400 => result found (OK)
/api/search?q=*&geo_point=48.872895,2.354527&geo_radius=150 => result not found (OK)
/api/search?q=*&geo_point=48.872895,2.354527&geo_radius=50 => result not found (OK)
Data validation in the form is a good thing for data quality, thank you very much. I close the ticket. Steven.