gtfs-bench
gtfs-bench copied to clipboard
Remove xsd:duration datatype from the mappings
Describe the bug Thanks for providing these insightful resources. I have been using them lately and I have encountered some minor issues. I tried to follow your description from your journal paper to materialize the KG as RDF. I have seen a couple of problems.
- Materializing the virtual KG as RDF using
rdfizer
leads to non-absolute IRIs in the RDF. - The remaining data seems to be valid RDF. However, the datatype of the values for the properties
arrivalTime
anddepartureTime
is specified asxsd:duration
while the values are not valid durations (under D-entailment). - The constructed data seems to be quite redundant. At scale 100, there more than 5 million different
ShapePoints
with the exact same latitude and longitude. (Also, there are only 960 distinct values for latitude and 1000 distinct values for longitude)
To Reproduce
- Generate the datasets using the provided docker tool and scale = 100
- Import the data in the
sql
directory into a MySQL DB using the provided script - Materialize the data using
rdfizer
and the mapping file provided in thekgc-eval
repo. (Seerdfizer
config below) - Convert ntriples to turtle using
rapper
Expected behavior The materialized RDF should be valid.
Screenshots or Video Example of a non-absolute IRI:
<http://transport.linkeddata.es/madrid/metro/feed/0000000000000000002s> <http://xmlns.com/foaf/0.1/page> <0000000000000000002s>.
Example of an invalid duration value:
<http://vocab.gtfs.org/terms#departureTime> "000000000000000000qe"^^<http://www.w3.org/2001/XMLSchema#duration>
Repeated ShapePoint
geo-location. The following query yields ?cnt = 5852988
.
SELECT (COUNT(DISTINCT *) AS ?cnt)
{
?x a gtfs:ShapePoint ;
geo:lat "999.999999999999999"^^xsd:double;
geo:long "999.999999999999999"^^xsd:double.
}
Resources (please complete the following information):
- OS: MacOS Sonoma
- Docker: Yes
- Mapping: https://github.com/oeg-upm/kgc-eval/blob/master/mappings/sdm-rdfizer/gtfs-rdb-rml-noselfjoin.ttl
- Data Format(s): Ntriples
- Data Size: 100
Additional material/context rdfizer config:
[default]
main_directory: /data/gtfs/datasets
[datasets]
number_of_datasets: 1
output_folder: ${default:main_directory}/graph
all_in_one_file: yes
remove_duplicate: yes
enrichment: yes
name: gtfs-rdf-100
ordered: yes
dbType: mysql
[dataset1]
name: MySQLDataset
mapping: ${default:main_directory}/sql/gtfs-rdb-rml-noselfjoin.ttl
host: localhost
port: 3306
db: gtfssql
user: root
password: XXX
Thanks for your support.