Reducing the GPU memory requirements
🚀 The feature, motivation and pitch
Going from 20Gb to 15Gb would make it possible to run it within Google Colab.
Alternatives
No response
Additional context
No response
Yeah, we are done with our bench, so we hope to officially support a quantized version soon
I would especially appreciate targeting an Nvidia 4070 TI Super as a minimum hardware requirement.
I got the quantized version working in google colab pro with the A100.
!python -m olmocr.pipeline ./localworkspace --markdown --pdfs tests/gnarly_pdfs/*.pdf --model allenai/olmOCR-7B-0225-preview-FP8
================================================================================
2025-06-27 02:51:51,430 - __main__ - INFO - FINAL METRICS SUMMARY
2025-06-27 02:51:51,430 - __main__ - INFO - ================================================================================
2025-06-27 02:51:51,430 - __main__ - INFO - Total elapsed time: 975.76 seconds
2025-06-27 02:51:51,430 - __main__ - INFO - Total Server Input tokens: 1,444,113
2025-06-27 02:51:51,430 - __main__ - INFO - Total Server Output tokens: 370,620
2025-06-27 02:51:51,431 - __main__ - INFO - Finished input tokens: 1,288,813
2025-06-27 02:51:51,431 - __main__ - INFO - Finished output tokens: 281,870
2025-06-27 02:51:51,431 - __main__ - INFO - Completed pages: 548
2025-06-27 02:51:51,431 - __main__ - INFO - Failed pages: 2
2025-06-27 02:51:51,431 - __main__ - INFO - Page Failure rate: 0.36%
2025-06-27 02:51:51,431 - __main__ - INFO - Server Input tokens/sec rate: 1479.99
2025-06-27 02:51:51,431 - __main__ - INFO - Server Output tokens/sec rate: 379.83
2025-06-27 02:51:51,431 - __main__ - INFO - Finished Input tokens/sec rate: 1320.84
2025-06-27 02:51:51,432 - __main__ - INFO - Finished Output tokens/sec rate: 288.87
2025-06-27 02:51:51,432 - __main__ - INFO - ================================================================================
2025-06-27 02:51:51,432 - __main__ - INFO - Work done
!python -m olmocr.pipeline ./localworkspace --markdown --pdfs tests/gnarly_pdfs/*.pdf
================================================================================
2025-06-26 22:22:46,523 - __main__ - INFO - FINAL METRICS SUMMARY
2025-06-26 22:22:46,523 - __main__ - INFO - ================================================================================
2025-06-26 22:22:46,523 - __main__ - INFO - Total elapsed time: 1100.16 seconds
2025-06-26 22:22:46,523 - __main__ - INFO - Total Server Input tokens: 1,437,585
2025-06-26 22:22:46,524 - __main__ - INFO - Total Server Output tokens: 369,647
2025-06-26 22:22:46,524 - __main__ - INFO - Finished input tokens: 1,407,474
2025-06-26 22:22:46,524 - __main__ - INFO - Finished output tokens: 310,046
2025-06-26 22:22:46,524 - __main__ - INFO - Completed pages: 549
2025-06-26 22:22:46,524 - __main__ - INFO - Failed pages: 1
2025-06-26 22:22:46,524 - __main__ - INFO - Page Failure rate: 0.18%
2025-06-26 22:22:46,524 - __main__ - INFO - Server Input tokens/sec rate: 1306.71
2025-06-26 22:22:46,524 - __main__ - INFO - Server Output tokens/sec rate: 336.00
2025-06-26 22:22:46,524 - __main__ - INFO - Finished Input tokens/sec rate: 1279.34
2025-06-26 22:22:46,524 - __main__ - INFO - Finished Output tokens/sec rate: 281.82
2025-06-26 22:22:46,524 - __main__ - INFO - ================================================================================
2025-06-26 22:22:46,525 - __main__ - INFO - Work done
Woot, so glad. I think we'll make the quantized version the official one in a future release even.