de.setf.wilbur
de.setf.wilbur copied to clipboard
Inconsistencies in serialization formats
With two serializer methods, it could be expected that if Wilbur correctly reads or generates triples/DB, then it would serializer correctly to both.
However, there seem to be some inconsistencies between the ntriples and the rdf/xml serializers. Some code that runs in the first one won't run on the other one.
For instance, I have some triples:
10: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:ID #1 {10052D3BC3}>
11: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:FORM #"The" {10052D4313}>
12: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:LEMMA #"the" {10052D4523}>
13: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:UPOSTAG #"DET" {10052D4743}>
14: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:XPOSTAG #"DT" {10052D4963}>
15: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:FEATS !conll:pronTypeArt {10052D4B73}>
16: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:FEATS !conll:definiteDef {10052D4D33}>
17: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:HEAD #3 {10052D4F43}>
18: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:DEPREL #"det" {10052D5163}>
19: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:DEPS #"_" {10052D5373}>
They were generated by the following code (I've removed non-relevant parts):
(let* ((token-node (node (format nil "NAMESPACE:~a-~a" sentence-id (token-id token))))
(slots '(id form lemma upostag xpostag feats head deprel deps))
(slot-nodes
(list
'id `(,(wilbur:literal (slot-value token 'id)))
'form `(,(wilbur:literal (slot-value token 'form)))
'lemma `(,(wilbur:literal (slot-value token 'lemma)))
'upostag `(,(wilbur:literal (slot-value token 'upostag)))
'xpostag `(,(wilbur:literal (slot-value token 'xpostag)))
'feats (convert-features-to-rdf (slot-value token 'feats))
'head `(,(wilbur:literal (slot-value token 'head)))
'deprel `(,(wilbur:literal (slot-value token 'deprel)))
'deps `(,(wilbur:literal (slot-value token 'deps))))))
`(,@(mappend
#'(lambda (slot)
(mapcar
#'(lambda (value-node)
(wilbur:triple
token-node
(node (format nil "conll:~a" (string-upcase slot)))
value-node))
(getf slot-nodes slot)))
slots)))
This head
field is an integer. While serialization as ntriples works correctly, exporting it as a number, serialization as rdf/xml returns an error:
The value
3
is not of type
SEQUENCE
[Condition of type TYPE-ERROR]
Restarts:
0: [RETRY] Retry SLIME REPL evaluation request.
1: [*ABORT] Return to SLIME's top level.
2: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {1001FAFFA3}>)
Backtrace:
0: (SB-IMPL::SEQUENCE-TO-LIST 3) [tl,external]
1: (WILBUR::EXTENDED-STRING->CHAR-CODES 3)
2: (WILBUR:ESCAPE-XML-STRING 3 T)
3: ((LABELS WILBUR::DUMP :IN WILBUR::DUMP-AS-RDF/XML) !NAMESPACE:test-1 ((!#1=conll:DEPS . #"_") (!#1#:DEPREL . #"det") (!#1#:HEAD . #3) (#2=!#1#:FEATS . !#1#:definiteDef) (#2# . !#1#:pronTypeArt) (!#1#:..
4: (WILBUR::DUMP-AS-RDF/XML (#<WILBUR:TRIPLE #1=!#2=NAMESPACE:c6DC441D0-76F3-460E-A332-DC3F66422077 #3=!rdf:type !#4=conll:Corpus {10052D03E3}> #<WILBUR:TRIPLE #1# #5=!rdfs:label #"my-corpus" {10052D0693..
( @arademaker )