olmocr
olmocr copied to clipboard
Quantization performance, word error rate, decoding performance from section 4.2 of paper
Hello!
Thank you for this contribution!
I am very excited to try this model with OpenVINO; I build an advanced system based on qwen2-vl this fall for dense table analysis it was very successful but suffered from word error rates above 5% when prompting against 100dpi images on a set of SOTA dense tables. A project requirement was to ensure reliable output in an unsupervised workflow with tight resource constraints
Anecdotally I recall experiencing similar results to the repeating tokens on occasion; I also observed similar degeneration performance with qwen2-vl-7b-instruct with similar frequency on a set of 120 tables which were harder than anything I have seen in the literature. In my experience changing precision and stepping up to 72b yielded similar behavior, though word error rate was a higher concern across quantized and full precision versions.
Interestingly the 7b was compressed to int4; since olmocr is a finetune of the same architecture
The paper does not discuss quantization behavior; instead, we can refer to the original paper which does include tests in INT4 with negligible accuracy loss on Qwen teams significantly more opaque evaluation process. I am interested to know if this was evaluated for olmoOCR and what the findings were.
However, it also seems like this project targeted non-CPU only usecases where full precision performance would be trivial to achieve on while complying with other findings in the literature on compressing/quantizing vision encoders.
Thanks for this work!