Mike Gerber
Mike Gerber
@lmarti-dev What git did you use here? Or is it included in the Python install from e.g. the Microsoft Store? (Not my platform and I need some more information to...
(This was already an issue with `setup.py` as I understand it.)
> > @mikegerber Would it be possible to extend `setuptools-ocrd` to dereference the symlink OS-independently at build time? > > That's a bit of a hack :\ How does the...
Happens with our `merged` test directory.
Another one, this time with `test` (current dataset): ``` (dinglehopper) mike.gerber@lx0246:~$ sh /data-ssd/mike.gerber/dta-gt-data/test-eval.sh Traceback (most recent call last): File "/home/mike.gerber/.pyenv/versions/dinglehopper/bin/dinglehopper-line-dirs", line 8, in sys.exit(main()) File "/home/mike.gerber/.pyenv/versions/3.9.20/envs/dinglehopper/lib/python3.9/site-packages/click/core.py", line 1157, in __call__...
`chardet` seems to be bad at dealing with these short tests: ``` In [2]: print(chardet.detect("Nur zum Prüfen von 'chardet'.".encode("utf-8"))) {'encoding': 'ISO-8859-9', 'confidence': 0.6587004243912733, 'language': 'Turkish'} ``` For plain text files...
Branch now has `--plain-encoding` and warns about auto-detecting (for `dinglehopper-line-dirs`) 
This probably gives up problems with the UTF-8 BOM again, need to check.
We also need to review the CLIs again, I don't even remember we had an option to process directories (!= directories of lines)...
Note: working in the feat/flex-line-dirs branch on this, because 1. it came up there 2. the line dirs are especially affected because short texts are the input format there.