AutoSPARQL icon indicating copy to clipboard operation
AutoSPARQL copied to clipboard

duplicated results with different descriptions

Open diadem opened this issue 12 years ago • 4 comments

the following query "houses in Summertown" retrieves several times the two properties:

Water Eaton Road, Summertown OX2 £399,950.00 Street: Water Eaton Road, Summertown OX2

bedrooms: 2

bathrooms: 1

Divinity Road, Cowley OX4 £399,950.00 Street: Water Eaton Road, Summertown OX2

bedrooms: 2

bathrooms: 1

It can be a problem directly in the extracted data, or in the visualization

diadem avatar Jul 10 '12 14:07 diadem

This is a problem in the extracted data where the same URI is used for different entries. This results in several solutions when using SPARQL which appear to be duplicates in the UI, but have for instance different descriptions or images.

LorenzBuehmann avatar Jul 10 '12 17:07 LorenzBuehmann

Ok, great. That means if we fix the issue with same URIs that should go away?

I am not 100% convinced, e.g., for right now queries such as "houses in headington" say "using fallback" and then return

Horton Hill, Horton Cum Studley, OX33 The proposed development comprises the construction of a 3-storey extension to the rear of the hotel to accommodate an additional 20 bedrooms and ancillary accommodation, 4 detached houses and garages and a shop to the front of the hotel. Planning Statement: Although the houses/hotel extension can now be built in phases, a condition attached to the Planning Permission for the houses requires that the hotel extension shall be built concurrently with the houses and that the houses may not be occupied until the hotel extension is complete and rea... £1,600,000.00

x 7

then

Land For SalePortland Road, Milcombe, Banbury, OX15 Situated in Portland Road, Milcombe is this residential Building Land with permission for 5 houses situated in quiet village location adjoining open...

x 6

And so on and so forth. That seems more than the possible URI overlap.

timfu avatar Jul 11 '12 11:07 timfu

Yes, there are also duplicates in the Lucene index which is used as fallback. Have to check why this happens.

LorenzBuehmann avatar Jul 11 '12 20:07 LorenzBuehmann

Ok, the duplicates in the fallback Lucene index occur because of the duplicates in the extracted data. I avoid this now by only indexing 1 document per distinct URI, but this indeed lowers the recall.

LorenzBuehmann avatar Jul 12 '12 07:07 LorenzBuehmann