marker
marker copied to clipboard
WIP: Foundation Model Integration
trafficstars
Switching over OCRBuilder and EquationProcessor to use the new foundation model
Also removes the need for the inline math detection model, while greatly simplifying LineBuilder
In addition, adds a special mode - --fix_lines enabling the model to re-write lines with formatting, math, or garbled text. When this flag is set, every line in the document is passed through the OCR model for potential re-writing
Pending:
- [x] Don't replace good lines, tune model
- [x] Support new tag types
- [x] Retain anchor tags when replacing lines in all cases
- [x] Fix math inside tables
- [ ] Add tests for the new functionality, replace old tests which are currently being skipped