mustaszewski
mustaszewski
Dear Mikel, first of all congratulations on this great piece of work and thank you for sharing it with the community. I experienced out-of-memory errors when mapping pre-trained fastText embeddings...
Thank you for developing this very useful package. However, I have a problem with the `crawlUrlfilter` argument. From a large website, I would like to crawl and scrape only those...
Does the pre-training of Donut require bounding boxes of individual words? In the synthetically generated SynthDoG dataset (https://huggingface.co/datasets/naver-clova-ix/synthdog-en), which was also used for Donut pretraining, there are no bounding boxes,...