sist2
sist2 copied to clipboard
Invalid output: <...> (Success) when specifying output directory
Device Information (please complete the following information):
- OS:
Ubuntu 20.04.2 LTS
- Deployment:
Linux Binary
- SIST2 Version:
2.11.5
- Elasticsearch Version (if relevant) :
7.14.0
Command with arguments
~~./sist2-x64-linux-debug scan --ocr eng /mnt/best/BEST_ACE/DOCS/OLD/ -o ~/.docs_old_idx ./sist2-x64-linux-debug: error while loading shared libraries: libasan.so.4: cannot open shared object file: No such file or directory
~~
~~If I run this in the regular binary I get~~
./sist2 scan --ocr eng /mnt/best/BEST_ACE/DOCS/OLD/ -o ~/.docs_old_idx
Invalid output: '/home/willwade/.docs_old_idx/' (Success).
Am I doing something wrong?
Describe the bug
Failing. See above.
Steps To Reproduce
- mkdir /mnt/dir
- smbmount a directory to the di
- install tesseract language file
- run the scan index wit --ocr eng
Expected behavior Should OCR the files..
Actual Behavior Crashing?!
Interestingly if I run it without the -o option e.g...
./sist2 scan --ocr eng /mnt/best/BEST_ACE/DOCS/OLD
I get
[7F4E3EC42A40] [2021-12-15 16:26:01] [FATAL cli.c] Could not find tesseract language file!
but Ive definitely installed it. e.g.
sudo apt install tesseract-ocr-eng [sudo] password for willwade: Reading package lists... Done Building dependency tree Reading state information... Done tesseract-ocr-eng is already the newest version (1:4.00~git30-7274cfa-1). 0 upgraded, 0 newly installed, 0 to remove and 86 not upgraded.
To use the debug binary you need to install the libasan4
package (or libasan5
I don't remember exactly). Ideally you want to use the release binary for better performance and only use the debug one to help me troubleshoot crashes.
For Invalid output: <...> (Success)
thing it's a little bit weird, does sist2 have write permission to that directory? Or does that folder already exist?
For tesseract language file it might be because ubuntu changed the folder they used to save the language files. Can you try to locate them on your machine? What is the output of find /usr/share/ -name "*.traineddata"
?
Re: tesseract:
/usr/share/tesseract-ocr/4.00/tessdata/osd.traineddata /usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata
Re: write access. Do you mean does It have write access to '/home/willwade/.docs_old_idx/'
- yeah - sist2 created it. It won't be able to write to the mounted dir though - I think I've set that as read only. Does it need that?
Done the sudo apt install libasan4 - no more debug output other than Invalid output: ..
For now you can just copy the eng.traineddata in your working directory (the same directory as the sist2 binary) and it should work.
I don't have access to my workstation for a few days. I'm trying to understand why the -o
option doesn't work here, can you try to specify the full path and not use a .
prefix for the folder? This shouldn't be an issue but it's worth trying
For example ... -o /home/willwade/docs_old_idx/
ok - so popping the tesseract training file has worked.
what's weird though is even though sist2 made the directory on one attempt - if I rerun a more successful scan I get that Invalid output: 'index.sist2/' (Success).
error. So I think that is saying "dir exists".
I would totally have expected it to have just written over the top of it - maybe thats just user error though..
So anyway - got it working now!
Ok I'm glad it works now
No it should never overwrite the out directory, but it should say "Output exists" or something like that. I'll update the message or fix the error checking, it should not consider Success
as an error code