Inference and fine-tuning support for GOT-OCR2.
Inference:
CUDA_VISIBLE_DEVICES=0 swift infer --model_type got-ocr2 --model_id_or_path stepfun-ai/GOT-OCR2_0
<<< <image>OCR:
Input an image path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/ocr.png
简介 SWIFT支持250+LLM和35+MLLM(多模态大模型)的训练、推理、 评测和部署。开发者可以直接将我们的框架应用到自己的Research和 生产环境中,实现模型训练评测到应用的完整链路。我们除支持了 PEFT提供的轻量训练方案外,也提供了一个完整的Adapters库以支持 最新的训练技术,如NEFTune、LoRA+、LLaMA-PRO等,这个适配器 库可以脱离训练脚本直接使用在自己的自定流程中。 为方便不熟悉深度学习的用户使用,我们提供了一个Gradio的web-ui用 于控制训练和推理,并提供了配套的深度学习课程和最佳实践供新手入 门。 此外,我们也在拓展其他模态的能力,目前我们支持了AnimateDiff的 全参数训练和LoRA训练。 SWIFT具有丰富的文档体系,如有使用问题请请查看这里 可以在Huggingfacespace和ModelScope创空间中体验SWIFTweb ui功能了。
--------------------------------------------------
<<< clear
<<< <image>OCR:
Input an image path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/ocr_en.png
Introduction
SWIFT supports training, inference, evaluation and deployment of 250+ LLMs
and 35+ MLLMs (multimodal large models). Developers can directly apply our
framework to their own research and production environments to realize the
complete workflow from model training and evaluation to application. In addition
to supporting the lightweight training solutions provided by PEFT, we also
provide a complete Adapters library to support the latest training techniques
such as NEFTune, LoRA+, LLaMA-PRO, etc. This adapter library can be used
directly in your own custom workflow without our training scripts.
To facilitate use by users unfamiliar with deep learning, we provide a Gradio
web-ui for controlling training and inference, as well as accompanying deep
learning courses and best practices for beginners.
Additionally, we are expanding capabilities for other modalities. Currently, we
support full-parameter training and LoRA training for AnimateDiff.
SWIFT has rich documentations for users, please check here.
SWIFT web-ui is available both on Huggingface space and ModelScope studio,
please feel free to try!
fine-tuning:
# fine-tuning LLM & projector, freeze vision encoder
CUDA_VISIBLE_DEVICES=0 swift sft \
--model_type got-ocr2 --model_id_or_path stepfun-ai/GOT-OCR2_0 \
--sft_type lora \
--dataset latex-ocr-print#5000
# DDP & ZeRO2
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
--model_type got-ocr2 --model_id_or_path stepfun-ai/GOT-OCR2_0 \
--sft_type lora \
--dataset latex-ocr-print#5000 \
--deepspeed default-zero2
inference after fine-tuning
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir output/got-ocr2/vx-xxx/checkpoint-xxx \
--load_dataset_config true \
你好,请问一下支持 swift vllm 部署吗?类似下面的指令 CUDA_VISIBLE_DEVICES=0 swift deploy --model_type llava1_6-vicuna-13b-instruct --infer_backend vllm
请问何时能支持vllm推理?
I am trying to finetune GOT in Hindi. The dataset I am using is from HuggingFace datasets( damerajee/hindi-ocr ). It contains only two columns, one is an image and the other is text present in the image.
I have prepared a json file in the following format(taken from the official GOT OCR2.0 repo)
{"query": "
{
"query": "
Is the above .json file right? Or should I be placing the image object(PIL image object) instead of the image path? In response, I have given the text(ground truth) that I am expecting from the model, am I right?
Now the issue is how do I use this fine-tuned model? I went through the documentation, unlike in your Official GOT online demo which directly accepts image, in this fine tuned version one must enter a prompt or "
I am doing all this as a part of a project to build a basic application using Streamlit. The GitHub repository of the same is given below- https://github.com/AISpaceXDragon/GOT-OCR2.0.git
Thank you for giving your time in reading my queries and I hope that I will receive your response as soon as possible.
@AISpaceXDragon I see you have successfully fine tuned the model in another language, Hindi. Can you provide me with a way to build a training dataset on the new language you made? I'm very grateful for that
@AISpaceXDragon I see you have successfully fine tuned the model in another language, Hindi. Can you provide me with a way to build a training dataset on the new language you made? I'm very grateful for that
As I mentioned I am using one of the dataset from HuggingFace Datasets link and I didn't build it. But I think you meant building the ".json file" for a given dataset,is it? Please let me know, so that I could assist you.
@AISpaceXDragon That's right, I mean how to build ".json file" from a standard data set
@AISpaceXDragon Can you tell me at what stage do you do it when fine tuning? And are the results after fine tuning similar to the original results published by the author? I mean is it approximately?
@AISpaceXDragon That's right, I mean how to build ".json file" from a standard data set
I wrote a python script to prepare the .json file for a dataset. The format of the entries in the json file is same as mentioned in the comment before. The script that I have written takes the image and stores them in a folder. Whereas the "response" part in the json entry, contains the ground truth(text present in the image, in my case) <-- This is what we want our model to give as a reply, when given with the image path specificed in the "image" part of the json entry.
This is what I have done, but I was not able to evaluate the model with the same format.
This is why I posted comment in this issues space.
Format - {"query": "55555", "response": "66666", "images": ["image_path"]}
@AISpaceXDragon Can you tell me at what stage do you do it when fine tuning? And are the results after fine tuning similar to the original results published by the author? I mean is it approximately?
What do you mean by "Can you tell me at what stage do you do it when fine tuning?"? I didn't get you. Please try to be clear.
Answer for "And are the results after fine tuning similar to the original results published by the author? I mean is it approximately?" The thing is that, I fine-tuned the model on Google Colab, which means limited compute resources. As per my observation, if fine-tuned for more number of epochs and on more data ,the results would be excellent(as mentioned in the research paper).
@AISpaceXDragon Reply to "Can you tell me at what stage do you do it when fine tuning?". I see the author mentioned the following in the README.md section:
0.Train sample can be found here. Note that the '
@minhduc01168 Reply to "I see the author mentioned the following in the README.md section: 0.Train sample can be found here. Note that the '' in the 'conversations'-'human'-'value' is necessary! 1.This codebase only supports post-training (stage-2/stage-3) upon our GOT weights. 2. If you want to train from stage-1 described in our paper, you need this repo."
I see that you are referring to training of the model, but I am referring to fine-tuning of the model. This means I am working only at Stage 2 or 3.
Note that training is different from fine tuning. Trianing means taking the defined model architecture with random weights and passing all the inputs until it the model gives corresponding correct outputs. Fine-tuning means taking these pretrained weights(learnings of the model) and use it for a specific variation of the same task.(In this case I want to perform OCR which is the main aim of the model, but as the training data used while training the model was mostly English and Chinese, the model is efficient only at these languages. But I want the model to extend these capabilities to other language, in my case Hindi, so I took the pretrained weights(ability of the model to extract text from images) and trained it on different language. This means, I want the ability of the model to extract text from images but only for different language along side the languages which it was already trained on.
I hope you understand what I am trying to convey. Let me know, if you didn't understand any part of the explanation.
@Jintao-Huang Could you answer my question?
@AISpaceXDragon Have you had anyone explain the data format below? Can you explain it to me? I'm very grateful for that.
{"query": "
hey , can someone please help and tell me how can i train this model on MNIST dataset?
@AISpaceXDragon HELP please
@minhduc01168 Reply to "Have you had anyone explain the data format below? Can you explain it to me? I'm very grateful for that. {"query": "55555", "response": "66666", "images": ["image_path"]} {"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]} {"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]]}"
Answer for first part of the question, I understood them myself. No one explained them to me.
Answer to the second part of the question, There are three formats of there are three formats of data as mentioned. The first one is query. This contains the prompt and the image tag i.e.,
The explanation for second data format is similar to the first one, except it contains a new entry that is history which records all the previous responses of the model for the given images.
The explanation for the third data format is similar to that of the above. Here, the history contains list of all the query and responses pairs that you have given separately in the data format one.
I hope you are understood by explanation else let me know. Thank you.
hey , can someone please help and tell me how can i train this model on MNIST dataset?
Follow the instructions as given by modelscope's ms swift documentation.
Let me know if you didn't get it, thank you.
I tried it on google colab and i got the error above as i send
On Sun, Oct 6, 2024, 1:30 PM Srimanth @.***> wrote:
hey , can someone please help and tell me how can i train this model on MNIST dataset?
Follow the instructions as given by modelscope's ms swift documentation.
Let me know if you didn't get it, thank you.
— Reply to this email directly, view it on GitHub https://github.com/modelscope/ms-swift/issues/2122#issuecomment-2395384530, or unsubscribe https://github.com/notifications/unsubscribe-auth/BIFJWBPRBAC5GCZOQVNIRRDZ2EGLRAVCNFSM6AAAAABOZ62CEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJVGM4DINJTGA . You are receiving this because you commented.Message ID: @.***>
@AISpaceXDragon Did you train successfully and is everything working well? Thank you very much for your answer.
Yes training fine but testing no at alll
On Sun, Oct 6, 2024, 6:17 PM minhduc01168 @.***> wrote:
@AISpaceXDragon https://github.com/AISpaceXDragon Did you train successfully and is everything working well? Thank you very much for your answer.
— Reply to this email directly, view it on GitHub https://github.com/modelscope/ms-swift/issues/2122#issuecomment-2395475873, or unsubscribe https://github.com/notifications/unsubscribe-auth/BIFJWBP32DORYJ5YASVCRX3Z2FIBBAVCNFSM6AAAAABOZ62CEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJVGQ3TKOBXGM . You are receiving this because you commented.Message ID: @.***>
infer vllm?
@Jintao-Huang Could you answer my question?
Hello, the holiday just ended, and I didn’t reply in time. What was the issue? 😊
@Jintao-Huang Can you explain it to me? I'm very grateful for that. {"query": "55555", "response": "66666", "images": ["image_path"]} {"query": "eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]} {"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]]}
This format might be clearer.
{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "<image><image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response1"], ["query2", "response2"]]}
@Jintao-Huang Thank you for explaining it to me. Because my GPU resources are limited. Can you tell me how I can load the weight model to continue training? Thank you
@AISpaceXDragon sorry, What is OCR u use to have response in Data Format? Pytesseract or GOT-OCR or something?? Thank u
I didn't get you. Please try to be clear.
@Jintao-Huang I wanna fine-tune to OCR table image other language. I don't get what is content of response ? Have structure table line by line or Latex tabular? Can u explain help me? Thank u
{"query": "
微调后,调用微调后的模型报错:
如何解决?模型内容如下:
你需要merge lora. 才会有config.json文件
请问merge lora应该在哪一步操作呀?我不是很懂,谢谢!