olmocr
olmocr copied to clipboard
Significant difference of performance between online demo and local inference
🐛 Describe the bug
There is a significant difference of performance between online demo and local inference but I can't find the reason. For example, online demo processed this page as expected. However, the local deployed version found it JSON decode error. I checked the raw output of the model, it kept repeating some content and its result on table recognition is also worse than the online demo.
Are there any suggestions for me? error_page.pdf
Versions
version 0.1.61, installed from source.
Hmm, the online demo runs vllm vs sglang, but otherwise should be identical. There is some randomness is sampling, ex. have you tried running the PDF through several times each way, is it always wrong in the local version?
Hmm, the online demo runs vllm vs sglang, but otherwise should be identical. There is some randomness is sampling, ex. have you tried running the PDF through several times each way, is it always wrong in the local version?
Yes, I have run it several times (by both setting larger max_try_num and manually run the code several times), but the problem persists.
Closing this issue for now, please feel to reopen if you want to discuss further
@jakep-allenai What parameters like temperature etc are used in the online demo vs the ones provided in pipeline.py. I am also noticing differences, pretty sure there is something different on online demo as compared to the repo. Can we access the code for web demo ?
Hey, I don't have the full demo code to share, but I will share what I easily can right now on the inference side:
https://gist.github.com/jakep-allenai/15c713545062ef458b7efa2101d69c06
It only has 3 retries at (0.1, 0.4, and 0.8 temperature) compared to a slower ramp up on the local inference side.
The demo is served with vllm 0.9.2 on an A100-80GB, but without flash infer installed in the container, which is a little different as well.
Do you have any English language files where you see an explicit difference that we can see?