Does anyone konw the average speed or time to recognize a single image when using RTX 4090 20gb+?
I am planning to hire a 4090 server to recognize a batch of pngs (over 200,000) which were pre-processed from pdfs. However, I am not sure how much money and time I will cost.
There is another problem, that OCR models (not only olmocr) somethimes don't recognize the full content of the images. If I only have a few pngs, I can manually find out the incomplete results. However, It is not possible to locate them by human on a huge task.
Hey,
Definitely rent a L40s, the cost per page will be lowest in that case, even if the 4090's are cheaper per hour.
Yeah, the system has a sampling temperature so there is probability involved, and also the models are trained this way to not try to parse subimages from within a document, which is probably what you are seeing. Your case may benefit from fine tuning, but there is always going to be some error rate, it's just a question of how much error your use case can tolerate.
I am not sure about the cost, but on my 4090, I am getting lifetime avg 522 tks/s (although it fluctuates around 400tks/s - 600tks/s).
I'm getting .33 images per second on a 3090 TI 24GB with allenai/olmOCR-7B-0225-preview and .67 images per second with allenai/olmOCR-7B-0225-preview-FP8. I'm using vLLM and the gnarly pdfs to verify.
On Google Colab, I was able to get 1.77s and 2s per doc with an A100 with my own pdfs.