Logan
Logan
Donut doesn't predict boxes, hence the "OCR-Free" part. You can, however, use the attention scores to create "heatmaps" of what the model thought the answer was on the page ->...
14GB of VRAM will be difficult. If I'm remembering correctly, I trained with default settings (batch size 2, default input image size) and used about 40GB of VRAM.
Just giving this a bump... would be a fantastic addition
I've spent the day testing, and tesserocr seems to be slower than pytesseract. I need the boxes, so I'm comparing to image_to_data from pytesseract Here's my quick benchmark script (the...
Interesting findings, thanks for the follow-up! 💪🏻
@jerryjliu we can add an option for that, but that really only makes sense for objects you cam define on the fly (I.e. a query engine tool). This wouldn't really...
@jerryjliu Added `FnNodeMapping` and an example + test 👍🏻 Should be good to go
If you change the repo to `run-llama` instead of `jerryliu` it works, otherwise you get a `301 response, moved`
I'm currently training a setfit model with 4500 classes, 10 samples per class (using a proprietary dataset). I think it is still generating pairs though? I just see endless tqdm...
@grofte yea it never worked well for me either. I think the dataset is just too big haha