dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

Geospatial Search behavior

Open stevenferey opened this issue 2 years ago • 1 comments

entrepot.recherche.data.gouv.fr team

What steps does it take to reproduce the issue?

Populate a dataset with geospatial data (Geographic Bounding Box)

  • When does this issue occur?

When we search for this dataset with the search API and with the geo_point and geo_radius parameters

  • Which page(s) does it occurs on?

The search API result page

  • What happens?

For example for a Geographic Bounding Box = westLongitude "-2.258631" eastLongitude "-2.392748" northLongitude "47.518038" southLongitude "47.496346"

Link for details : https://linestrings.com/bbox/#-2.258631,47.496346,-2.392748,47.518038 Example in dataverse demo : https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/UYLYFK

and a search from Paris (about 400km): geo_point=48.872895,2.354527

Search results do not reflect reality: https://demo.dataverse.org/api/search?q=&geo_point=48.872895,2.354527&geo_radius=400 => result found (OK) https://demo.dataverse.org/api/search?q=&geo_point=48.872895,2.354527&geo_radius=150 => result found (KO) https://demo.dataverse.org/api/search?q=*&geo_point=48.872895,2.354527&geo_radius=50 => result not found (OK)

  • To whom does it occur (all users, curators, superusers)?

all users

  • What did you expect to happen?

A more precise search result, depending on the geo_point and the geo_radius given as query parameters.

Which version of Dataverse are you using?

5.13, 5.14

Any related open or closed issues to this bug report?

Google Group topic : https://groups.google.com/g/dataverse-community/c/0NynPGQAnE0

stevenferey avatar Aug 21 '23 09:08 stevenferey

2024/03/14

  • Sized at 10 for investigation, please resize based upon results
  • Also, note that the API has changed

cmbz avatar Mar 14 '24 19:03 cmbz

Hi @stevenferey, I am trying to test the example that you gave us but it seems the Geographic bounds on your post are not correct.

  • What version of Dataverse are you using?
  • Could you please let me know if you were able to input this data on the system and was not caught by the validation?
image

Best, Juan

jp-tosca avatar May 03 '24 16:05 jp-tosca

Hi @stevenferey, I was talking a bit with the team, and from what I see entrepot.recherche.data.gouv.fr is using Dataverse 5.14. and I have a couple of things.

On Dataverse 6.1 we added the validation to this field as you can see in the picture that I posted, this was not on 5.14 so there is a possibility that the data that you posted (which, is invalid data) was introduced on the database. I would suggest fixing this data, and then re-index to check if this solves the search problem.

We also rename some of these fields recently and * northLongitude* and southLongitude doesn't exist anymore, they were renamed to northLatitude and southLatitude as they should be.

Best, Juan

jp-tosca avatar May 03 '24 17:05 jp-tosca

My guess would be that in earlier versions, this box was indexed as the strip around the whole Earth, excluding the small east-west region intended (as in the image). That could explain why there was a hit at 150K - the box extended directly south of Paris (and wasn't ~400K west). Flipping the east/west coords should give the expected results. If not, this is still an issue. @stevenferey - can you check and close/update this issue as appropriate?

image

qqmyers avatar May 03 '24 17:05 qqmyers

Hello, Thank you for your feedback, I tested and the results are OK with the inverted values :

/api/search?q=*&geo_point=48.872895,2.354527&geo_radius=400 => result found (OK)
/api/search?q=*&geo_point=48.872895,2.354527&geo_radius=150 => result not found (OK)
/api/search?q=*&geo_point=48.872895,2.354527&geo_radius=50 => result not found (OK)

Data validation in the form is a good thing for data quality, thank you very much. I close the ticket. Steven.

stevenferey avatar May 15 '24 12:05 stevenferey