de.setf.wilbur icon indicating copy to clipboard operation
de.setf.wilbur copied to clipboard

Inconsistencies in serialization formats

Open GPPassos opened this issue 7 years ago • 0 comments

With two serializer methods, it could be expected that if Wilbur correctly reads or generates triples/DB, then it would serializer correctly to both.

However, there seem to be some inconsistencies between the ntriples and the rdf/xml serializers. Some code that runs in the first one won't run on the other one.

For instance, I have some triples:

10: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:ID #1 {10052D3BC3}>
11: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:FORM #"The" {10052D4313}>
12: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:LEMMA #"the" {10052D4523}>
13: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:UPOSTAG #"DET" {10052D4743}>
14: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:XPOSTAG #"DT" {10052D4963}>
15: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:FEATS !conll:pronTypeArt {10052D4B73}>
16: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:FEATS !conll:definiteDef {10052D4D33}>
17: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:HEAD #3 {10052D4F43}>
18: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:DEPREL #"det" {10052D5163}>
19: #<WILBUR:TRIPLE !NAMESPACE:test-1 !conll:DEPS #"_" {10052D5373}>

They were generated by the following code (I've removed non-relevant parts):

(let* ((token-node (node (format nil "NAMESPACE:~a-~a" sentence-id (token-id token))))
	(slots '(id form lemma upostag xpostag feats head deprel deps))
	(slot-nodes
	 (list
	  'id `(,(wilbur:literal (slot-value token 'id)))
	  'form `(,(wilbur:literal (slot-value token 'form)))
	  'lemma `(,(wilbur:literal (slot-value token 'lemma)))
	  'upostag `(,(wilbur:literal (slot-value token 'upostag)))
	  'xpostag `(,(wilbur:literal (slot-value token 'xpostag)))
	  'feats (convert-features-to-rdf (slot-value token 'feats))
	  'head `(,(wilbur:literal (slot-value token 'head)))
	  'deprel `(,(wilbur:literal (slot-value token 'deprel)))
	  'deps `(,(wilbur:literal (slot-value token 'deps))))))
    
    `(,@(mappend
	  #'(lambda (slot)
	      (mapcar
	       #'(lambda (value-node) 
		   (wilbur:triple
		    token-node
		    (node (format nil "conll:~a" (string-upcase slot)))
		    value-node))
	       (getf slot-nodes slot)))
	  slots)))

This head field is an integer. While serialization as ntriples works correctly, exporting it as a number, serialization as rdf/xml returns an error:

The value
  3
is not of type
  SEQUENCE
   [Condition of type TYPE-ERROR]

Restarts:
 0: [RETRY] Retry SLIME REPL evaluation request.
 1: [*ABORT] Return to SLIME's top level.
 2: [ABORT] abort thread (#<THREAD "repl-thread" RUNNING {1001FAFFA3}>)

Backtrace:
  0: (SB-IMPL::SEQUENCE-TO-LIST 3) [tl,external]
  1: (WILBUR::EXTENDED-STRING->CHAR-CODES 3)
  2: (WILBUR:ESCAPE-XML-STRING 3 T)
  3: ((LABELS WILBUR::DUMP :IN WILBUR::DUMP-AS-RDF/XML) !NAMESPACE:test-1 ((!#1=conll:DEPS . #"_") (!#1#:DEPREL . #"det") (!#1#:HEAD . #3) (#2=!#1#:FEATS . !#1#:definiteDef) (#2# . !#1#:pronTypeArt) (!#1#:..
  4: (WILBUR::DUMP-AS-RDF/XML (#<WILBUR:TRIPLE #1=!#2=NAMESPACE:c6DC441D0-76F3-460E-A332-DC3F66422077 #3=!rdf:type !#4=conll:Corpus {10052D03E3}> #<WILBUR:TRIPLE #1# #5=!rdfs:label #"my-corpus" {10052D0693..

( @arademaker )

GPPassos avatar Nov 13 '17 18:11 GPPassos