olmocr icon indicating copy to clipboard operation
olmocr copied to clipboard

Reducing the GPU memory requirements

Open mantielero opened this issue 7 months ago • 2 comments

🚀 The feature, motivation and pitch

Going from 20Gb to 15Gb would make it possible to run it within Google Colab.

Alternatives

No response

Additional context

No response

mantielero avatar May 19 '25 19:05 mantielero

Yeah, we are done with our bench, so we hope to officially support a quantized version soon

jakep-allenai avatar May 19 '25 22:05 jakep-allenai

I would especially appreciate targeting an Nvidia 4070 TI Super as a minimum hardware requirement.

physics515 avatar May 27 '25 20:05 physics515

I got the quantized version working in google colab pro with the A100.

!python -m olmocr.pipeline ./localworkspace --markdown --pdfs tests/gnarly_pdfs/*.pdf --model allenai/olmOCR-7B-0225-preview-FP8

================================================================================
2025-06-27 02:51:51,430 - __main__ - INFO - FINAL METRICS SUMMARY
2025-06-27 02:51:51,430 - __main__ - INFO - ================================================================================
2025-06-27 02:51:51,430 - __main__ - INFO - Total elapsed time: 975.76 seconds
2025-06-27 02:51:51,430 - __main__ - INFO - Total Server Input tokens: 1,444,113
2025-06-27 02:51:51,430 - __main__ - INFO - Total Server Output tokens: 370,620
2025-06-27 02:51:51,431 - __main__ - INFO - Finished input tokens: 1,288,813
2025-06-27 02:51:51,431 - __main__ - INFO - Finished output tokens: 281,870
2025-06-27 02:51:51,431 - __main__ - INFO - Completed pages: 548
2025-06-27 02:51:51,431 - __main__ - INFO - Failed pages: 2
2025-06-27 02:51:51,431 - __main__ - INFO - Page Failure rate: 0.36%
2025-06-27 02:51:51,431 - __main__ - INFO - Server Input tokens/sec rate: 1479.99
2025-06-27 02:51:51,431 - __main__ - INFO - Server Output tokens/sec rate: 379.83
2025-06-27 02:51:51,431 - __main__ - INFO - Finished Input tokens/sec rate: 1320.84
2025-06-27 02:51:51,432 - __main__ - INFO - Finished Output tokens/sec rate: 288.87
2025-06-27 02:51:51,432 - __main__ - INFO - ================================================================================
2025-06-27 02:51:51,432 - __main__ - INFO - Work done

!python -m olmocr.pipeline ./localworkspace --markdown --pdfs tests/gnarly_pdfs/*.pdf

================================================================================
2025-06-26 22:22:46,523 - __main__ - INFO - FINAL METRICS SUMMARY
2025-06-26 22:22:46,523 - __main__ - INFO - ================================================================================
2025-06-26 22:22:46,523 - __main__ - INFO - Total elapsed time: 1100.16 seconds
2025-06-26 22:22:46,523 - __main__ - INFO - Total Server Input tokens: 1,437,585
2025-06-26 22:22:46,524 - __main__ - INFO - Total Server Output tokens: 369,647
2025-06-26 22:22:46,524 - __main__ - INFO - Finished input tokens: 1,407,474
2025-06-26 22:22:46,524 - __main__ - INFO - Finished output tokens: 310,046
2025-06-26 22:22:46,524 - __main__ - INFO - Completed pages: 549
2025-06-26 22:22:46,524 - __main__ - INFO - Failed pages: 1
2025-06-26 22:22:46,524 - __main__ - INFO - Page Failure rate: 0.18%
2025-06-26 22:22:46,524 - __main__ - INFO - Server Input tokens/sec rate: 1306.71
2025-06-26 22:22:46,524 - __main__ - INFO - Server Output tokens/sec rate: 336.00
2025-06-26 22:22:46,524 - __main__ - INFO - Finished Input tokens/sec rate: 1279.34
2025-06-26 22:22:46,524 - __main__ - INFO - Finished Output tokens/sec rate: 281.82
2025-06-26 22:22:46,524 - __main__ - INFO - ================================================================================
2025-06-26 22:22:46,525 - __main__ - INFO - Work done

salsasteve avatar Jun 27 '25 03:06 salsasteve

Woot, so glad. I think we'll make the quantized version the official one in a future release even.

jakep-allenai avatar Jun 27 '25 17:06 jakep-allenai