gtfs-bench icon indicating copy to clipboard operation
gtfs-bench copied to clipboard

Remove xsd:duration datatype from the mappings

Open Lars-H opened this issue 1 year ago • 7 comments

Describe the bug Thanks for providing these insightful resources. I have been using them lately and I have encountered some minor issues. I tried to follow your description from your journal paper to materialize the KG as RDF. I have seen a couple of problems.

  • Materializing the virtual KG as RDF using rdfizer leads to non-absolute IRIs in the RDF.
  • The remaining data seems to be valid RDF. However, the datatype of the values for the properties arrivalTime and departureTime is specified as xsd:duration while the values are not valid durations (under D-entailment).
  • The constructed data seems to be quite redundant. At scale 100, there more than 5 million different ShapePoints with the exact same latitude and longitude. (Also, there are only 960 distinct values for latitude and 1000 distinct values for longitude)

To Reproduce

  1. Generate the datasets using the provided docker tool and scale = 100
  2. Import the data in the sql directory into a MySQL DB using the provided script
  3. Materialize the data using rdfizer and the mapping file provided in the kgc-eval repo. (See rdfizer config below)
  4. Convert ntriples to turtle using rapper

Expected behavior The materialized RDF should be valid.

Screenshots or Video Example of a non-absolute IRI:

<http://transport.linkeddata.es/madrid/metro/feed/0000000000000000002s> <http://xmlns.com/foaf/0.1/page> <0000000000000000002s>.

Example of an invalid duration value:

<http://vocab.gtfs.org/terms#departureTime> "000000000000000000qe"^^<http://www.w3.org/2001/XMLSchema#duration>

Repeated ShapePoint geo-location. The following query yields ?cnt = 5852988.

SELECT (COUNT(DISTINCT *) AS ?cnt)
{
?x a gtfs:ShapePoint ;
 	geo:lat "999.999999999999999"^^xsd:double;
 	geo:long "999.999999999999999"^^xsd:double.
}

Resources (please complete the following information):

  • OS: MacOS Sonoma
  • Docker: Yes
  • Mapping: https://github.com/oeg-upm/kgc-eval/blob/master/mappings/sdm-rdfizer/gtfs-rdb-rml-noselfjoin.ttl
  • Data Format(s): Ntriples
  • Data Size: 100

Additional material/context rdfizer config:

[default]
main_directory: /data/gtfs/datasets

[datasets]
number_of_datasets: 1
output_folder: ${default:main_directory}/graph
all_in_one_file: yes
remove_duplicate: yes
enrichment: yes
name: gtfs-rdf-100
ordered: yes
dbType: mysql

[dataset1]
name: MySQLDataset
mapping: ${default:main_directory}/sql/gtfs-rdb-rml-noselfjoin.ttl
host: localhost
port: 3306
db: gtfssql
user: root
password: XXX

Thanks for your support.

Lars-H avatar Oct 11 '23 06:10 Lars-H