olmocr icon indicating copy to clipboard operation
olmocr copied to clipboard

Quantization performance, word error rate, decoding performance from section 4.2 of paper

Open SearchSavior opened this issue 4 days ago • 0 comments

Hello!

Thank you for this contribution!

I am very excited to try this model with OpenVINO; I build an advanced system based on qwen2-vl this fall for dense table analysis it was very successful but suffered from word error rates above 5% when prompting against 100dpi images on a set of SOTA dense tables. A project requirement was to ensure reliable output in an unsupervised workflow with tight resource constraints

Anecdotally I recall experiencing similar results to the repeating tokens on occasion; I also observed similar degeneration performance with qwen2-vl-7b-instruct with similar frequency on a set of 120 tables which were harder than anything I have seen in the literature. In my experience changing precision and stepping up to 72b yielded similar behavior, though word error rate was a higher concern across quantized and full precision versions.

Interestingly the 7b was compressed to int4; since olmocr is a finetune of the same architecture

The paper does not discuss quantization behavior; instead, we can refer to the original paper which does include tests in INT4 with negligible accuracy loss on Qwen teams significantly more opaque evaluation process. I am interested to know if this was evaluated for olmoOCR and what the findings were.

However, it also seems like this project targeted non-CPU only usecases where full precision performance would be trivial to achieve on while complying with other findings in the literature on compressing/quantizing vision encoders.

Thanks for this work!

SearchSavior avatar Feb 28 '25 22:02 SearchSavior