hdt-java
hdt-java copied to clipboard
TTL files as input to rdf2hdt produces invalid blank node IDs
Using an input ttl file from W3C SPARQL 1.0 Test Suite (i18n,) I run it through rdf2hdt and dump the contents using hdtSearch:
./bin/rdf2hdt.sh sample.ttl sample.hdt
[INFO] Scanning for projects...
[INFO] Inspecting build with total of 1 modules...
[INFO] Installing Nexus Staging features:
[INFO] ... total of 1 executions of maven-deploy-plugin replaced with nexus-staging-maven-plugin
[INFO]
[INFO] ----------------------< org.rdfhdt:hdt-java-cli >-----------------------
[INFO] Building HDT Java Command line Tools 3.0.10
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ hdt-java-cli ---
[WARN] base uri not specified, using 'file:///path/to/sample.ttl'
[INFO] Converting path/to/sample.ttl to path/to/sample.hdt as TURTLE
File converted in ..... 524 ms 808 us
Total Triples ......... 9
Different subjects .... 4
Different predicates .. 5
Different objects ..... 9
Common Subject/Object . 0
HDT saved to file in .. 7 ms 942 us
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.314 s
[INFO] Finished at: 2024-05-06T16:36:52-04:00
[INFO] ------------------------------------------------------------------------
./bin/hdtSearch.sh sample.hdt
[INFO] Scanning for projects...
[INFO] Inspecting build with total of 1 modules...
[INFO] Installing Nexus Staging features:
[INFO] ... total of 1 executions of maven-deploy-plugin replaced with nexus-staging-maven-plugin
[INFO]
[INFO] ----------------------< org.rdfhdt:hdt-java-cli >-----------------------
[INFO] Building HDT Java Command line Tools 3.0.10
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ hdt-java-cli ---
>> ? ? ?
Query: |?| |?| |?|
_:@0 http://www.w3.org/2001/sw/DataAccess/tests/data/i18n/normalization.ttl#resumé "Alice's normalized resumé"
_:@0 http://xmlns.com/foaf/0.1/name "Alice"
_:@1 http://www.w3.org/2001/sw/DataAccess/tests/data/i18n/normalization.ttl#resumé "Bob's non-normalized resumé"
_:@1 http://xmlns.com/foaf/0.1/name "Bob"
_:@2 http://www.w3.org/2001/sw/DataAccess/tests/data/i18n/normalization.ttl#resumé "Eve's non-normalized resumé"
_:@2 http://www.w3.org/2001/sw/DataAccess/tests/data/i18n/normalization.ttl#resumé "Eve's normalized resumé"
_:@2 http://xmlns.com/foaf/0.1/name "Eve"
file:///path/to/sample.ttl http://www.w3.org/2000/01/rdf-schema#comment "Normalized and non-normalized IRIs"
file:///path/to/sample.ttl http://www.w3.org/2002/07/owl#versionInfo "$Id: normalization-01.ttl,v 1.1 2005/10/25 09:38:08 aseaborne Exp $"
Iterated 9 triples in 22 ms 504 us
While I cannot find @ called out in ttl or nt spec, when using @ for blank nodes in the examples from the docs above, riot CLI throws validation errors when a blank node begins with @
cat <<EOF > blanknode.ttl
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
_:@123 foaf:knows _:@1234 .
_:@1234 foaf:knows _:@123 .
EOF
cat <<EOF > blanknode.nt
_:@123 <http://xmlns.com/foaf/0.1/knows> _:bob .
_:bob <http://xmlns.com/foaf/0.1/knows> _:@123.
EOF
riot --validate --time blanknode.ttl
17:02:00 ERROR riot :: [line: 3, col: 3 ] Blank node label does not start with alphabetic or _ : '@'
blanknode.ttl : (No Output) : 1 errors : 0 warnings
riot --validate --time blanknode.nt
17:02:05 ERROR riot :: [line: 1, col: 3 ] Blank node label does not start with alphabetic or _ : '@'
blanknode.nt : (No Output) : 1 errors : 0 warnings
actually the spec does list valid characters:
RDF blank nodes in Turtle are expressed as _: followed by a blank node label which is a series of name characters. The characters in the label are built upon PN_CHARS_BASE, liberalized as follows:
Where PN_CHARS_BASE is the following list:
[A-Z]
[a-z]
[#x00C0-#x00D6]
[#x00D8-#x00F6]
[#x00F8-#x02FF]
[#x0370-#x037D]
[#x037F-#x1FFF]
[#x200C-#x200D]
[#x2070-#x218F]
[#x2C00-#x2FEF]
[#x3001-#xD7FF]
[#xF900-#xFDCF]
[#xFDF0-#xFFFD]
[#x10000-#xEFFFF]
Which does not include #x0040 for @