ocrs icon indicating copy to clipboard operation
ocrs copied to clipboard

Roadmap for 2024

Open robertknight opened this issue 1 year ago • 4 comments

This issue exists to document what I think are the highest priorities in the short-medium term.

Models and training:

  • [x] Document how to repeat the model training process (https://github.com/robertknight/ocrs-models/issues/6). A repeatable training process is IMO required for an ML project to really be considered open source. Also this is needed to enable fine tuning or training for new languages
  • [ ] Add benchmarks so the accuracy can be tracked over time
  • [ ] Expand the training data sets for detection and recognition to improve accuracy

ocrs library and CLI tool:

  • [ ] Add the infrastructure to support multiple languages and model updates (https://github.com/robertknight/ocrs/issues/8, https://github.com/robertknight/ocrs/issues/4)
  • [x] Add end-to-end tests that actually check the output. There is a simple end-to-end test but it only verifies that the CLI tool can be built and runs, not the actual output (https://github.com/robertknight/ocrs/pull/25)
  • [x] Improve runtime performance and efficiency

Beyond the short term list, here are some themes for subsequent work:

  • Continue expanding the datasets and test cases to improve accuracy
  • Use machine learning for layout analysis
  • Quantize the models to 8-bit to make the downloads smaller and execution faster
  • Improve WebAssembly execution performance
  • Add bindings for other languages (eg. C, Python, Node)

And some longer term things:

  • Support GPU inference. This will probably involve making the execution engine pluggable.

robertknight avatar Jan 07 '24 10:01 robertknight