Billy Cao

Results 302 comments of Billy Cao

你的batch size太大了,这和streaming与否没关系。我的pr解决的是卡第一个step1的问题

你要把cutoff_len缩小试试,到4096

你的单卡显存就是不够那么大的context length,缩小后超过最大token是个常见的妥协。要不然你就试试qlora,但是会更慢

这个应该和OP不是一个问题了

> 在取一个batch的时候,会处理远超一个batch对应的data 你的buffer_size和preprocessing_batch_size设的是什么? 试试把buffer_size设成global batch size,preprocessing设成1 另外把TOKENIZERS_PARALLELISM=0加上

这个参数不是我删的,你要找commit这个变动的人问问

Related to https://github.com/huggingface/transformers/pull/31342 but I dont quite get your changes - what exactly does it fix? When I tested all the pipelines in fp16 none of them had issues outputting...

This isnt the dataset but weights.

There is no plan for such integration now.

I agree that it is not the best OCR solution, and also often doesn't get the text right for me. I recommend PaddleOCR https://github.com/PaddlePaddle/PaddleOCR/blob/main/README_en.md EDIT: add some comparison using my...