AutoSPARQL
AutoSPARQL copied to clipboard
duplicated results with different descriptions
the following query "houses in Summertown" retrieves several times the two properties:
Water Eaton Road, Summertown OX2 £399,950.00 Street: Water Eaton Road, Summertown OX2
bedrooms: 2
bathrooms: 1
Divinity Road, Cowley OX4 £399,950.00 Street: Water Eaton Road, Summertown OX2
bedrooms: 2
bathrooms: 1
It can be a problem directly in the extracted data, or in the visualization
This is a problem in the extracted data where the same URI is used for different entries. This results in several solutions when using SPARQL which appear to be duplicates in the UI, but have for instance different descriptions or images.
Ok, great. That means if we fix the issue with same URIs that should go away?
I am not 100% convinced, e.g., for right now queries such as "houses in headington" say "using fallback" and then return
Horton Hill, Horton Cum Studley, OX33 The proposed development comprises the construction of a 3-storey extension to the rear of the hotel to accommodate an additional 20 bedrooms and ancillary accommodation, 4 detached houses and garages and a shop to the front of the hotel. Planning Statement: Although the houses/hotel extension can now be built in phases, a condition attached to the Planning Permission for the houses requires that the hotel extension shall be built concurrently with the houses and that the houses may not be occupied until the hotel extension is complete and rea... £1,600,000.00
x 7
then
Land For SalePortland Road, Milcombe, Banbury, OX15 Situated in Portland Road, Milcombe is this residential Building Land with permission for 5 houses situated in quiet village location adjoining open...
x 6
And so on and so forth. That seems more than the possible URI overlap.
Yes, there are also duplicates in the Lucene index which is used as fallback. Have to check why this happens.
Ok, the duplicates in the fallback Lucene index occur because of the duplicates in the extracted data. I avoid this now by only indexing 1 document per distinct URI, but this indeed lowers the recall.