olmocr icon indicating copy to clipboard operation
olmocr copied to clipboard

Does anyone konw the average speed or time to recognize a single image when using RTX 4090 20gb+?

Open augumn opened this issue 9 months ago • 3 comments

I am planning to hire a 4090 server to recognize a batch of pngs (over 200,000) which were pre-processed from pdfs. However, I am not sure how much money and time I will cost.

augumn avatar Apr 01 '25 03:04 augumn

There is another problem, that OCR models (not only olmocr) somethimes don't recognize the full content of the images. If I only have a few pngs, I can manually find out the incomplete results. However, It is not possible to locate them by human on a huge task.

augumn avatar Apr 02 '25 03:04 augumn

Hey,

Definitely rent a L40s, the cost per page will be lowest in that case, even if the 4090's are cheaper per hour.

Yeah, the system has a sampling temperature so there is probability involved, and also the models are trained this way to not try to parse subimages from within a document, which is probably what you are seeing. Your case may benefit from fine tuning, but there is always going to be some error rate, it's just a question of how much error your use case can tolerate.

jakep-allenai avatar Apr 08 '25 03:04 jakep-allenai

I am not sure about the cost, but on my 4090, I am getting lifetime avg 522 tks/s (although it fluctuates around 400tks/s - 600tks/s).

Pedrexus avatar Apr 14 '25 05:04 Pedrexus

I'm getting .33 images per second on a 3090 TI 24GB with allenai/olmOCR-7B-0225-preview and .67 images per second with allenai/olmOCR-7B-0225-preview-FP8. I'm using vLLM and the gnarly pdfs to verify.

salsasteve avatar Jul 09 '25 13:07 salsasteve

On Google Colab, I was able to get 1.77s and 2s per doc with an A100 with my own pdfs.

salsasteve avatar Jul 09 '25 14:07 salsasteve