Yu-won Lee

Results 230 comments of Yu-won Lee

You could prepare the data for OCR and start training. You should follow the format I have written in the README. https://huggingface.co/datasets/linxy/LaTeX_OCR This could be helpful for preparing OCR data.

I've trained the vision-encoder alone and it takes a bit much memory with it. But its strange that only the size of 500x800 takes too much. Could you check ```...

Well that could take a bit much memmory but, it should be okay with it. Maybe limiting the image size could be a bit useful. InternVL2.5 uses dynamic resoultion but...

Thanks for letting me know. I'll check again for adjusting the token numbers with `max_pixels`. Also I'll add some args for width and height.

I've updated the code to explicitly set resized_height and resized_width for both images and videos.

Yes, but I have no resource to test the DPO method yet so it should take some time.

Now I've got some time to work on so, I'll strat trying.

@50Bytes-dev I've updated the DPO code. Thanks for waiting. Please let me know if it has some problems. Before you use it. You should update the `trl` to `trl==0.16.1`.

It's not quite different. ``` [ { "id": "000000033471", "conversations": [ { "from": "human", "value": "Identify the odd one out: Twitter, Instagram, Telegram" }, { "from": "gpt", "value": "Telegram" },...

@GaoMengGladys If there are no key "image" in the file, it would pass the image reading from it. Also, "\n" should be removed from the text.