hocr-proofreader
hocr-proofreader copied to clipboard
Saving file doesn't work
Hi! When I click the Save-button I get the message:
Failed to load resource: the server responded with a status of 404 (Not Found)
/hocr-proofreader/save.php:1
The editor is very nicely designed and I would like to test it further.
Thanks for your useful work!
Hi, thanks for your response. Unfortunately the project is in a very early stage - currently it is more a "hOCR viewer" instead of a "editor". It currently implements just the basic ideas of a OCR-Web-Proofreader, to see what's possible.
When having the time, I'll continue developing it. Help is welcome ;-)
To your question: Saving of documents will be out of scope of this project anyway. This project covers just the frontend part of the editor (to be embedded in other applications). Providing a backend storage is your part ;-)
Hi! Thanks for reply! I see, I'll keep following the project and mention it to colleagues in Helsinki who work with similar topics. We have quite many books that should be proofread, and I haven't found a very well working solution to proofread hOCR output from Tesseract. Ideally the output would be saved with page coordinates as well, but I know that gets messy after manual edits. I liked very much how navigating the text was implemented here. In principle setting up the backend is no problem either, good luck with your project!
Yes, same to me. That was also my intention to start this project as I didn't find a good existing solution.
It was also my plan to keep the page coordinates as good as possible. I.e. split the bounding boxes when inserting a whitespace, and allow manually editing/correcting the bounding-boxes, etc. One goal is, to render Image-With-Text-Beyond PDFs from those hOCRs - so the coordinates are very important.
It would be great to find some more developers interested in this - the current implementation is just a ~450 line pure JavaScript using recent browser features, so it's quite manageable. ;-)
I got the hocr-proofreader display my files very nicely, and I'll still experiment with it quite a bit. Great work! The bounding box problems seem common to all editors, but I agree, having the coordinates is very important. Drawing them manually sounds like a good idea, I think I haven't seen that option in other editors.
I'll come up with some solution to save the hocr file for now, I'll also look deeper into JavaScript, although I'm not so familiar with it. Anyway I like very much how it is rather lightweight and does the basic document navigation so painlessly. I'll keep you updated.
In case you are curious, I'm working in Helsinki with Tesseract models for one alphabet used in Soviet Union for Komi-Zyrian language at 1920s. I'm getting to the point where proofreading starts to be sensible, so I'm looking into all alternatives.
Thanks. Cool, very interesting :-)