handwriting-ocr icon indicating copy to clipboard operation
handwriting-ocr copied to clipboard

Rework - IMPORTANT

Open Breta01 opened this issue 5 years ago • 5 comments

Currently, the project is undergoing big reorganization. The new code is in rework branch and once this issue is closed it will be merged with master. What will be new:

  • More logical structure
  • Incorporating new much larger datasets
  • Unifying naming and style of code
  • Removing de-precedent code
  • Dropping support of Czech accents recognition

This brings some breaking changes. I recommend moving to the new code because I will no longer fix the issues from the old versions.

Model retraining

With the new version, some old models may become incompatible. Also, the old models were trained only on a small dataset. This requires large retraining. I would appreciate any help with this task because I have only limited access to some computation clouds.

Dropping support of Czech accents

The Czech accents will be removed from the words. Keeping only some text files which allow recovery of them. This solves some compatibility issues with different OS. Also, models trained on this dataset weren't very accurate. However, as a school project, I will be creating software which automatically adds Czech accents to sentences. This is an only partial solution of the problem, but I don't have enough data for successful recognition of them anyways.

Breta01 avatar Nov 20 '18 16:11 Breta01

Some updates:

  • I updated the ocr package
  • I am finishing the dataset section with all the scripts. It should be big step up for the project, so please let me know if it works.
  • I will continue with rework of the notebooks

Breta01 avatar Nov 26 '18 22:11 Breta01

  • I will try to follow this guide for updating the project: https://guide.esciencecenter.nl/
  • I will also try to automate as many task as possible.
  • Update for TensorFlow 2.0
  • Follow code style Black

Breta01 avatar Apr 01 '20 08:04 Breta01

Ideas for better propagation https://guide.esciencecenter.nl/best_practices/communication.html

  • Web page
  • Docker image
  • online demo
  • screencast

I am also thinking about adding tests and setting up some continuous integration like travis CI

Breta01 avatar Apr 08 '20 06:04 Breta01

Hi, I'm having trouble understanding the readme files. Any Youtube video that can explain how to get the datasets and creating the envs. Most of the packages are unavailable for installation.

SRK-returns avatar Apr 27 '21 15:04 SRK-returns

Hi @SRK-returns,

which branch do you use? The update or master branch? I don't have any video instructions. It also depends on your OS.

Breta01 avatar Apr 27 '21 16:04 Breta01