flat
flat copied to clipboard
Error upon submitting correction annotations
I got errors on two files upon submitting correction annotations, and those files would not open anymore, there is nginx gateway timeout signalled.
I am attaching the docserver logs here too.
Update: after restarting nginx and the documentserver, the second file (0030) opened again. For the first one (000) the error stays:
The log above is of the web service, attached please is the foliadocserver's log...
Keep getting this error, related to corrections, after test-annotating for a while on different files. Attached please find the last hour's activity log.
Can it perhaps be also related to insufficient memory? The service runs on Debian 10 and is currently pretty thin: total used free shared buff/cache available Mem: 16019 7259 2062 299 6696 8377
It looks like a corruption occurred for your first file (an invalid reference somewhere), that's a definitely a bug in the system because that should not happen of course. I thought I had an auto-correction mechanism in place for that already, but I may be wrong, since clearly it fails to load now. Could you send the two FoLiA documents so I can pinpoint why exactly it might have gone wrong?
Can it perhaps be also related to insufficient memory? The service runs on Debian 10 and is currently pretty thin: total used free shared buff/cache available Mem: 16019 7259 2062 299 6696 8377
Nah, that sounds like enough memory.
Thanks a lot, attached are the files.
test_FA-MBK-4-3_035245008_0030_abpproc_pars_ucto.folia.xml.txt test_FA-MBK-4-3_035245008_000_abpproc.folia.xml.txt
I thought I had an auto-correction mechanism in place for that already, but I may be wrong, since clearly it fails to load now.
There was a bug in this mechanism, so that's probably what caused part of the problem (fixed and released already in foliapy v2.5.7). I can't really pinpoint the problem on the 0030 document yet, will investigate further.
Thanks, so how should I update FLAT so that I can load these annotated documents again?
I run
pip install git+https://github.com/proycon/foliadocserve
pip install git+https://github.com/proycon/flat
By this I got folia-2.5.5
There was a doc that I test-annotated (pls see attached), which I cannot import correctly, although foliavalidator says it is fine.
FLAT says:
Uploaded file is no valid FoLiA Document: FoLiA exception in handling of @ line 93 (in parent @ parent line 92) : [InvalidReference] FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70Traceback (most recent call last): -- File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 7574, in getitem -- return self.index[key] -- KeyError: 'FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70' -- -- During handling of the above exception, another exception occurred: -- -- Traceback (most recent call last): -- File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 6322, in parsexml -- return doc[id] -- File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 7583, in getitem -- raise KeyError("No such key: " + key) -- etc.
FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.flatout.folia.xml.txt
On my PC, both folialint and foliavalidator reject this file:
folialint ../FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.flatout.folia.xml.txt ../FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.flatout.folia.xml.txt failed: XML error: Unresolvable id FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70 in WordReference
foliavalidator ../FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.flatout.folia.xml.txt VALIDATION ERROR on full parse by library (stage 2/3), in ../FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.flatout.folia.xml.txt ParseError: FoLiA exception in handling of
@ line 93 (in parent @ parent line 92) : [InvalidReference] FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70
Looking closer, the problem is in this fragment:
<entities>
<entity xml:id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.entity.1" class="ff:italic" processor="proc.pirolen.039f428d" datetime="2021-09-09T17:
01:08">
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70" t="-;"/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.3" t="Wodeham"/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.4" t="("/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.5" t="?"/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.6" t=")"/>
</entity>
<entity xml:id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.entity.2" class="lem:Auth" processor="proc.pirolen.039f428d" datetime="2021-09-09T17:0
1:58">
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70" t="-;"/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.3" t="Wodeham"/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.4" t="("/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.5" t="?"/>
<wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.6" t=")"/>
<feat subset="normaliz_intern" class="Adam;"/>
</entity>
</entities>
the processor="proc.pirolen.039f428d"
created 2 strange entities, both referring to a word? created by ucto, which isn't in the text.
I have no clue why or how. Seems very strange and is surely dead wrong.
Thanks a lot for troubleshooting! The text in this file is very nonstandard, it comes from a register in Latin, with very problematic/incorrect OCR on which I run ucto. I did some test-annotations with a test-entity set. During this I also did some corrections, typically with direct edits. It can thus be that I direct-deleted some word or string that was annotated already...
Direct edits are the source of a lot of evil. As you discovered. That should not be a standard procedure. Using FLAT, these problems should have been avoided (I Hope)
I was using FLAT.
It might be this: in the entity annotation, the string "-;" is a single unit. Whereas in the correction these are separate tokens.
<correction xml:id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.correction.1" class="word_misrec" processor="proc.foliadocserve.90a509c3" datetime="2021-09-09T17:00:55">
<new>
<w xml:id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.correction.1.w.1">
<t set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl">-<feat subset="normaliz_addPrevLem" class="Adam"/></t>
</w>
<w xml:id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.correction.1.w.2">
<t set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl">;</t>
</w>
</new>
</correction>
I suspect this can come from the order of steps I made:
- first added the entity annotation to "-;" which was at that time a single token (I assume I got it like that from ucto).
- then I realized that I want to add the annotation to the dash only, but not to the semicolon, so I did a direct edit correction and split these two characters.
- then the entity annotation broke I assume and FLAT reported an error.
As said, I was experimenting with an annotation approach and tagset and FLAT. If this order of annotation breaks things, I would configure my FLAT differently (e.g. not allowing direct edits). @proycon please tell me if this should be the case.
Yes, the core of the error was an invalid reference that was indeed created like you describe. i thought I had a simple workaround in place in the latest version that would allow loading your document, by simply dropping the invalid reference completely. You should be on foliapy v2.5.7 for this to work correctly, you might have a slightly older version still?
Thanks, I see! So when I update FLAT, I don't actually see a foliapy version being referred to, only folia (please see above): I run pip install git+https://github.com/proycon/foliadocserve pip install git+https://github.com/proycon/flat
By this I get folia-2.5.5
Just a remark: Should't the correction have an "original" node containing the "-;" thus keeping the reference alive? (although I fear a lot of confusion and mess later on, considering this reference)
@pirolen: do pip install git+https://github.com/proycon/foliapy
, it seems flat and foliadocserve itself don't force the latest version yet.
Just a remark: Should't the correction have an "original" node containing the "-;" thus keeping the reference alive? (although I fear a lot of confusion and mess later on, considering this reference)
If it's done in correction mode, yes, but not in direct edit mode.
By this I get folia-2.5.5
Yes, that's foliapy, you need 2.5.7 of that one
$pip show folia Name: FoLiA Version: 2.5.7
I completely stopped and restarted the webserver and foliadocserver, but still could not load this very document.
What I did then: I deleted the (several) annotations about wich FLAT complained, and then it worked to load the document. Will keep testing.
If it's done in correction mode, yes, but not in direct edit mode.
Ok, didn't know that (not a FLAT user). But this implies that 'direct edit mode' is highly dangerous. Not recommended imho.
Btw, you might have seen that I used ucto set to French and German, on these Latin texts... not sure what would be a better option.
Best would be to create a separated tokconfig-lat fil, su you could use -Llat. I wonder if this isn't just a simplified version from the French or German files.
Any input would be welcome.
Also the existing Italian files would be useful. I am happy to assist, e.g. compiling an abbreviations list. Still I am afraid that the data I work with is very idiosynchratic, i.e. that the list I'd compile would not generalize, or is that of no concern?
Well, ANY list of Latin abbreviations would be welcome, I suppose. As long as it doesn't contains entries that wouldn't be an abbreviation in "normal" Latin context. Including: this Wikipedia list might be a start?
I will let you know when I'll have a usable abbreviation list -- the data provider said that the abbreviations used in the texts at hand are almost random (as the printed book required).
Hi, I encountered similar issues (again on Latin), when using an ucto-ed file, in a containerized FLAT. Similar to my previous experiences (with a non-container FLAT back then):
- tokens disappear from the context of an annotated token, right after I submit the annotation
- annotations can not be added to tokens: submitting yields an error message. :-(
Could you please tell where the logfile is located in the container? I am going to attach it, together with before/after files and screenshots. Many thanks!
Disclaimer: I cannot exclude the possibility of unintentionally having used the GUI in a way that is not valid...
before/after as screenshot:
after annotating 'Loeive' as an entity, the rest of the sentence (word 2 and 3) disappeared from the file. Also, the document could not be loaded anymore in FLAT.

The two respective files. flat_error.folia.xml.txt test3_entries.uctoed.folia.xml.txt