olmocr Reducing the GPU memory requirements

🚀 The feature, motivation and pitch

Going from 20Gb to 15Gb would make it possible to run it within Google Colab.

Alternatives

No response

Additional context

No response

May 19 '25 19:05 mantielero

Yeah, we are done with our bench, so we hope to officially support a quantized version soon

May 19 '25 22:05 jakep-allenai

I would especially appreciate targeting an Nvidia 4070 TI Super as a minimum hardware requirement.

May 27 '25 20:05 physics515

I got the quantized version working in google colab pro with the A100.

!python -m olmocr.pipeline ./localworkspace --markdown --pdfs tests/gnarly_pdfs/*.pdf --model allenai/olmOCR-7B-0225-preview-FP8

================================================================================
2025-06-27 02:51:51,430 - __main__ - INFO - FINAL METRICS SUMMARY
2025-06-27 02:51:51,430 - __main__ - INFO - ================================================================================
2025-06-27 02:51:51,430 - __main__ - INFO - Total elapsed time: 975.76 seconds
2025-06-27 02:51:51,430 - __main__ - INFO - Total Server Input tokens: 1,444,113
2025-06-27 02:51:51,430 - __main__ - INFO - Total Server Output tokens: 370,620
2025-06-27 02:51:51,431 - __main__ - INFO - Finished input tokens: 1,288,813
2025-06-27 02:51:51,431 - __main__ - INFO - Finished output tokens: 281,870
2025-06-27 02:51:51,431 - __main__ - INFO - Completed pages: 548
2025-06-27 02:51:51,431 - __main__ - INFO - Failed pages: 2
2025-06-27 02:51:51,431 - __main__ - INFO - Page Failure rate: 0.36%
2025-06-27 02:51:51,431 - __main__ - INFO - Server Input tokens/sec rate: 1479.99
2025-06-27 02:51:51,431 - __main__ - INFO - Server Output tokens/sec rate: 379.83
2025-06-27 02:51:51,431 - __main__ - INFO - Finished Input tokens/sec rate: 1320.84
2025-06-27 02:51:51,432 - __main__ - INFO - Finished Output tokens/sec rate: 288.87
2025-06-27 02:51:51,432 - __main__ - INFO - ================================================================================
2025-06-27 02:51:51,432 - __main__ - INFO - Work done

!python -m olmocr.pipeline ./localworkspace --markdown --pdfs tests/gnarly_pdfs/*.pdf

================================================================================
2025-06-26 22:22:46,523 - __main__ - INFO - FINAL METRICS SUMMARY
2025-06-26 22:22:46,523 - __main__ - INFO - ================================================================================
2025-06-26 22:22:46,523 - __main__ - INFO - Total elapsed time: 1100.16 seconds
2025-06-26 22:22:46,523 - __main__ - INFO - Total Server Input tokens: 1,437,585
2025-06-26 22:22:46,524 - __main__ - INFO - Total Server Output tokens: 369,647
2025-06-26 22:22:46,524 - __main__ - INFO - Finished input tokens: 1,407,474
2025-06-26 22:22:46,524 - __main__ - INFO - Finished output tokens: 310,046
2025-06-26 22:22:46,524 - __main__ - INFO - Completed pages: 549
2025-06-26 22:22:46,524 - __main__ - INFO - Failed pages: 1
2025-06-26 22:22:46,524 - __main__ - INFO - Page Failure rate: 0.18%
2025-06-26 22:22:46,524 - __main__ - INFO - Server Input tokens/sec rate: 1306.71
2025-06-26 22:22:46,524 - __main__ - INFO - Server Output tokens/sec rate: 336.00
2025-06-26 22:22:46,524 - __main__ - INFO - Finished Input tokens/sec rate: 1279.34
2025-06-26 22:22:46,524 - __main__ - INFO - Finished Output tokens/sec rate: 281.82
2025-06-26 22:22:46,524 - __main__ - INFO - ================================================================================
2025-06-26 22:22:46,525 - __main__ - INFO - Work done

Jun 27 '25 03:06 salsasteve

Woot, so glad. I think we'll make the quantized version the official one in a future release even.

Jun 27 '25 17:06 jakep-allenai