flat icon indicating copy to clipboard operation
flat copied to clipboard

Error upon submitting correction annotations

Open pirolen opened this issue 3 years ago • 27 comments

I got errors on two files upon submitting correction annotations, and those files would not open anymore, there is nginx gateway timeout signalled.

I am attaching the docserver logs here too.

Screenshot 2021-09-08 at 16 29 13 Screenshot 2021-09-08 at 16 30 22 Screenshot 2021-09-08 at 16 30 43 foliaserverlog.txt

pirolen avatar Sep 08 '21 14:09 pirolen

Update: after restarting nginx and the documentserver, the second file (0030) opened again. For the first one (000) the error stays:

Screenshot 2021-09-08 at 16 53 13

pirolen avatar Sep 08 '21 14:09 pirolen

The log above is of the web service, attached please is the foliadocserver's log...

foliadocserve_pl_new.log.zip.txt

pirolen avatar Sep 09 '21 14:09 pirolen

Keep getting this error, related to corrections, after test-annotating for a while on different files. Attached please find the last hour's activity log.

Can it perhaps be also related to insufficient memory? The service runs on Debian 10 and is currently pretty thin: total used free shared buff/cache available Mem: 16019 7259 2062 299 6696 8377

fdocserve_lastactiv.log.txt

pirolen avatar Sep 09 '21 16:09 pirolen

It looks like a corruption occurred for your first file (an invalid reference somewhere), that's a definitely a bug in the system because that should not happen of course. I thought I had an auto-correction mechanism in place for that already, but I may be wrong, since clearly it fails to load now. Could you send the two FoLiA documents so I can pinpoint why exactly it might have gone wrong?

Can it perhaps be also related to insufficient memory? The service runs on Debian 10 and is currently pretty thin: total used free shared buff/cache available Mem: 16019 7259 2062 299 6696 8377

Nah, that sounds like enough memory.

proycon avatar Sep 10 '21 14:09 proycon

I thought I had an auto-correction mechanism in place for that already, but I may be wrong, since clearly it fails to load now.

There was a bug in this mechanism, so that's probably what caused part of the problem (fixed and released already in foliapy v2.5.7). I can't really pinpoint the problem on the 0030 document yet, will investigate further.

proycon avatar Sep 10 '21 19:09 proycon

Thanks, so how should I update FLAT so that I can load these annotated documents again?

I run pip install git+https://github.com/proycon/foliadocserve
pip install git+https://github.com/proycon/flat

By this I got folia-2.5.5

pirolen avatar Sep 20 '21 15:09 pirolen

There was a doc that I test-annotated (pls see attached), which I cannot import correctly, although foliavalidator says it is fine.

FLAT says:

Uploaded file is no valid FoLiA Document: FoLiA exception in handling of @ line 93 (in parent @ parent line 92) : [InvalidReference] FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70Traceback (most recent call last): -- File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 7574, in getitem -- return self.index[key] -- KeyError: 'FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70' -- -- During handling of the above exception, another exception occurred: -- -- Traceback (most recent call last): -- File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 6322, in parsexml -- return doc[id] -- File "/home/flatuser/flateditor/env/lib/python3.7/site-packages/folia/main.py", line 7583, in getitem -- raise KeyError("No such key: " + key) -- etc.

FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.flatout.folia.xml.txt

pirolen avatar Sep 20 '21 16:09 pirolen

On my PC, both folialint and foliavalidator reject this file:

folialint ../FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.flatout.folia.xml.txt ../FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.flatout.folia.xml.txt failed: XML error: Unresolvable id FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70 in WordReference

foliavalidator ../FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.flatout.folia.xml.txt VALIDATION ERROR on full parse by library (stage 2/3), in ../FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.flatout.folia.xml.txt ParseError: FoLiA exception in handling of @ line 93 (in parent @ parent line 92) : [InvalidReference] FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70

kosloot avatar Sep 20 '21 22:09 kosloot

Looking closer, the problem is in this fragment:

             <entities>
                <entity xml:id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.entity.1" class="ff:italic" processor="proc.pirolen.039f428d" datetime="2021-09-09T17:
01:08">
                  <wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70" t="-;"/>
                  <wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.3" t="Wodeham"/>
                  <wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.4" t="("/>
                  <wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.5" t="?"/>
                  <wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.6" t=")"/>
                </entity>
                <entity xml:id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.entity.2" class="lem:Auth" processor="proc.pirolen.039f428d" datetime="2021-09-09T17:0
1:58">
                  <wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.ucto.w.bc123fb8afdf4d1000c315a0ddacba70" t="-;"/>
                  <wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.3" t="Wodeham"/>
                  <wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.4" t="("/>
                  <wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.5" t="?"/>
                  <wref id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.w.6" t=")"/>
                  <feat subset="normaliz_intern" class="Adam;"/>
                </entity>
              </entities>

the processor="proc.pirolen.039f428d" created 2 strange entities, both referring to a word? created by ucto, which isn't in the text.

I have no clue why or how. Seems very strange and is surely dead wrong.

kosloot avatar Sep 22 '21 08:09 kosloot

Thanks a lot for troubleshooting! The text in this file is very nonstandard, it comes from a register in Latin, with very problematic/incorrect OCR on which I run ucto. I did some test-annotations with a test-entity set. During this I also did some corrections, typically with direct edits. It can thus be that I direct-deleted some word or string that was annotated already...

pirolen avatar Sep 22 '21 08:09 pirolen

Direct edits are the source of a lot of evil. As you discovered. That should not be a standard procedure. Using FLAT, these problems should have been avoided (I Hope)

kosloot avatar Sep 22 '21 08:09 kosloot

I was using FLAT.

pirolen avatar Sep 22 '21 08:09 pirolen

It might be this: in the entity annotation, the string "-;" is a single unit. Whereas in the correction these are separate tokens.

          <correction xml:id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.correction.1" class="word_misrec" processor="proc.foliadocserve.90a509c3" datetime="2021-09-09T17:00:55">
            <new>
              <w xml:id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.correction.1.w.1">
                <t set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl">-<feat subset="normaliz_addPrevLem" class="Adam"/></t>
              </w>
              <w xml:id="FA-MBK-4-3_035245008_0020_abpproc_partransf.text.1.div.1.p.1.list.1.item.1.s.1.correction.1.w.2">
                <t set="https://raw.githubusercontent.com/proycon/folia/master/setdefinitions/text.foliaset.ttl">;</t>
              </w>
            </new>
          </correction>

pirolen avatar Sep 22 '21 08:09 pirolen

I suspect this can come from the order of steps I made:

  • first added the entity annotation to "-;" which was at that time a single token (I assume I got it like that from ucto).
  • then I realized that I want to add the annotation to the dash only, but not to the semicolon, so I did a direct edit correction and split these two characters.
  • then the entity annotation broke I assume and FLAT reported an error.

As said, I was experimenting with an annotation approach and tagset and FLAT. If this order of annotation breaks things, I would configure my FLAT differently (e.g. not allowing direct edits). @proycon please tell me if this should be the case.

pirolen avatar Sep 22 '21 08:09 pirolen

Yes, the core of the error was an invalid reference that was indeed created like you describe. i thought I had a simple workaround in place in the latest version that would allow loading your document, by simply dropping the invalid reference completely. You should be on foliapy v2.5.7 for this to work correctly, you might have a slightly older version still?

proycon avatar Sep 22 '21 09:09 proycon

Thanks, I see! So when I update FLAT, I don't actually see a foliapy version being referred to, only folia (please see above): I run pip install git+https://github.com/proycon/foliadocserve pip install git+https://github.com/proycon/flat

By this I get folia-2.5.5

pirolen avatar Sep 22 '21 09:09 pirolen

Just a remark: Should't the correction have an "original" node containing the "-;" thus keeping the reference alive? (although I fear a lot of confusion and mess later on, considering this reference)

kosloot avatar Sep 22 '21 09:09 kosloot

@pirolen: do pip install git+https://github.com/proycon/foliapy , it seems flat and foliadocserve itself don't force the latest version yet.

Just a remark: Should't the correction have an "original" node containing the "-;" thus keeping the reference alive? (although I fear a lot of confusion and mess later on, considering this reference)

If it's done in correction mode, yes, but not in direct edit mode.

proycon avatar Sep 22 '21 09:09 proycon

By this I get folia-2.5.5

Yes, that's foliapy, you need 2.5.7 of that one

proycon avatar Sep 22 '21 09:09 proycon

$pip show folia Name: FoLiA Version: 2.5.7

I completely stopped and restarted the webserver and foliadocserver, but still could not load this very document.

What I did then: I deleted the (several) annotations about wich FLAT complained, and then it worked to load the document. Will keep testing.

pirolen avatar Sep 22 '21 10:09 pirolen

If it's done in correction mode, yes, but not in direct edit mode.

Ok, didn't know that (not a FLAT user). But this implies that 'direct edit mode' is highly dangerous. Not recommended imho.

kosloot avatar Sep 22 '21 10:09 kosloot

Btw, you might have seen that I used ucto set to French and German, on these Latin texts... not sure what would be a better option.

pirolen avatar Sep 22 '21 12:09 pirolen

Best would be to create a separated tokconfig-lat fil, su you could use -Llat. I wonder if this isn't just a simplified version from the French or German files.

Any input would be welcome.

kosloot avatar Sep 22 '21 12:09 kosloot

Also the existing Italian files would be useful. I am happy to assist, e.g. compiling an abbreviations list. Still I am afraid that the data I work with is very idiosynchratic, i.e. that the list I'd compile would not generalize, or is that of no concern?

pirolen avatar Sep 22 '21 12:09 pirolen

Well, ANY list of Latin abbreviations would be welcome, I suppose. As long as it doesn't contains entries that wouldn't be an abbreviation in "normal" Latin context. Including: this Wikipedia list might be a start?

kosloot avatar Sep 22 '21 15:09 kosloot

I will let you know when I'll have a usable abbreviation list -- the data provider said that the abbreviations used in the texts at hand are almost random (as the printed book required).

pirolen avatar Oct 13 '21 09:10 pirolen

Hi, I encountered similar issues (again on Latin), when using an ucto-ed file, in a containerized FLAT. Similar to my previous experiences (with a non-container FLAT back then):

  • tokens disappear from the context of an annotated token, right after I submit the annotation
  • annotations can not be added to tokens: submitting yields an error message. :-(

Could you please tell where the logfile is located in the container? I am going to attach it, together with before/after files and screenshots. Many thanks!

Disclaimer: I cannot exclude the possibility of unintentionally having used the GUI in a way that is not valid...

pirolen avatar Feb 16 '23 19:02 pirolen

before/after as screenshot:

after annotating 'Loeive' as an entity, the rest of the sentence (word 2 and 3) disappeared from the file. Also, the document could not be loaded anymore in FLAT.

Screen Shot 2023-02-16 at 19 38 17

pirolen avatar Feb 16 '23 19:02 pirolen

The two respective files. flat_error.folia.xml.txt test3_entries.uctoed.folia.xml.txt

pirolen avatar Feb 16 '23 19:02 pirolen