rdftab.rs
rdftab.rs copied to clipboard
stacktrace when dealing with entity expansions
I get an error when loading ogg.owl.
(venv) ~/repos/semantic-sql(main) $ export RUST_BACKTRACE=full
(venv) ~/repos/semantic-sql(main) $ ./bin/rdftab db/ogg.db < owl/ogg.owl
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: RdfXmlError { kind: Xml(EscapeError(UnrecognizedSymbol(1..4, Ok("obo")))) }', src/main.rs:57:5
stack backtrace:
0: 0x109357e45 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h30b85a1761190f28
1: 0x10937644e - core::fmt::write::h5b0722e6ee659e34
2: 0x109356269 - std::io::Write::write_fmt::hf468289e762fa2f9
3: 0x10935aa9a - std::panicking::default_hook::{{closure}}::h836d46ca6b872224
4: 0x10935a7bf - std::panicking::default_hook::h2afcf1998cd93f8c
5: 0x10935b0ed - std::panicking::rust_panic_with_hook::he4f5d8b43533efd5
6: 0x10935ac82 - rust_begin_unwind
7: 0x109378acf - core::panicking::panic_fmt::h3559129da805eab4
8: 0x109378b45 - core::result::unwrap_failed::h170de03e7ee26a1a
9: 0x1093423a5 - rdftab::main::h1bc34813cbf130e1
10: 0x1093316a6 - std::rt::lang_start::{{closure}}::h63a82885a43041b4
11: 0x10935ab58 - std::panicking::try::do_call::h29bd6a8b4eb65398
12: 0x10936318b - __rust_maybe_catch_panic
13: 0x10935e389 - std::rt::lang_start_internal::h1cbb853ed77189ce
14: 0x109342559 - main
my prefix table is normal
sqlite> select * from prefix where prefix='obo';
obo|http://purl.obolibrary.org/obo/
This is just using the standard version of ogg
curl -L -s http://purl.obolibrary.org/obo/ogg.owl
This seems to be caused by entity expansions.
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
<!ENTITY foaf "http://xmlns.com/foaf/0.1/" >
<!ENTITY owl "http://www.w3.org/2002/07/owl#" >
<!ENTITY obo "http://purl.obolibrary.org/obo/" >
<!ENTITY dc "http://purl.org/dc/elements/1.1/" >
<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#" >
<!ENTITY iao "http://purl.obolibrary.org/obo/iao/" >
<!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#" >
<!ENTITY ncbitaxon "http://purl.obolibrary.org/obo/ncbitaxon#" >
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
<!ENTITY oboInOwl "http://www.geneontology.org/formats/oboInOwl#" >
<!ENTITY protege "http://protege.stanford.edu/plugins/owl/protege#" >
]>
<rdf:RDF xmlns="&obo;ogg.owl#"
xml:base="&obo;ogg.owl"
...
Jena also has issues:
$ riot --out RDFXML owl/ogg.owl > owl/ogg-riot.owl
11:02:17 ERROR riot :: [line: 1, col: 1 ] JAXP00010001: The parser has encountered more than "64000" entity expansions in this document; this is the limit imposed by the JDK.
I have dreaded memories of this kind of error from earlier experiences with RDF/XML, but haven't seen it for a while
The solution is to launder through robot:
robot convert -i owl/ogg.owl -o owl/ogg-robot.owl && mv owl/ogg-robot.owl owl/ogg.owl
More graceful handling upstream would be welcome, but not urgent as there is a workaround.
This must be happening in rio. Hopefully there's a setting we can tweak, or just update the dependency, because I do not want to get into the core of this.
@lmcmicu Can you check whether recent updates to rio_xml resolve this problem? This commit seems relevant: https://github.com/oxigraph/rio/commit/bb81f95d5cdf6dfcd278d92a2d51bf154a166fb5
Will do.
Yes it seems to work if we use the updated rio.
$ make build/ogg.db
rm -f build/ogg.db
sqlite3 build/ogg.db < build/prefix.sql
rdftab build/ogg.db < ogg.owl
If you would like to try it out by hacking the files on the master branch, then change the [dependencies] block in Cargo.toml to this one: https://github.com/ontodev/rdftab.rs/blob/271c36f3670fe1104c1b62da82f5538b2631e0c9/Cargo.toml#L7 and also comment out the [patch.crates-io] block: https://github.com/ontodev/rdftab.rs/blob/271c36f3670fe1104c1b62da82f5538b2631e0c9/Cargo.toml#L25
Then in src/main.rs you must add an include for Iri: https://github.com/ontodev/rdftab.rs/blob/271c36f3670fe1104c1b62da82f5538b2631e0c9/src/main.rs#L8 and you must change the call to RdfXmlParser so that it looks like this: https://github.com/ontodev/rdftab.rs/blob/271c36f3670fe1104c1b62da82f5538b2631e0c9/src/main.rs#L925
Then don't forget to run cargo build --release
The permanent solution will involve merging this branch into master: https://github.com/ontodev/rio/tree/merge-upstream-ws-changes