Vik Paruchuri

Results 136 comments of Vik Paruchuri

Hey, thanks for the suggestion. I'm working on some other components of marker first (removing the commercial use restriction, better OCR), but this is on my list for after that.

Thanks for the ping @Chrissi2802

Thanks for the message! Marker should support any language that tesseract supports right now that has left to right, top to bottom reading (unfortunately I don't think Arabic will work)....

The next version will support ~90 languages (coming in the next couple of weeks).

Can you show me the exact command and pdf you're using? are you removing the pages from the pdf, so there is only one, or using the --max_pages flag?

It doesn't differentiate between header levels right now. I'm planning to improve the detection of block types, but it's behind a few things on the roadmap

I'm not able to take this on due to time constraints, but happy to point someone in the right direction if they can take it.

Yes, this model can be finetuned. I don't have publicly available code for easily doing that, but the model implementation is in this repo. If you have public data, I'm...

Yes, this is an issue I've seen. I'm working on fixing it now (retraining the model).

> there is a minor issue with chinese OCR. run "surya_ocr DATA_PATH --langs zh", you will get unicode instead of plain text of chinese. I'll merge something to make the...