LLaVA-NeXT Regarding the issue of fine-tuning Llava onevision

Regarding the issue of fine-tuning Llava onevision

Open haozhang1234 opened this issue 1 year ago • 3 comments

I encountered three problems **The first one is that I found in the script finetune_ov.sh that in addition to the regular data_path and image_folder, there is also a video_folder. This confuses me a bit. Should the path in data_math be filled with the JSON of the image dataset or the JSON of the video dataset? Still use the onevision. YAML file provided in the document (my own image and video mixing path). **The second issue is that I still reported an error while running the code, but during my actual inspection, I found that CUDNN should not have any problems and should be able to display the version number and run correctly. My versions are CUDA12.1 and CUDNN8.9.7 ! image **The third one is * * I understand that the JSON format of the image text dataset should be ! image But I couldn't find the JSON training format for the video text dataset. I am looking at the training format of other related datasets' JSON training format (video chatgpt) and referring to this format to adjust the JSON training format of the dataset I am looking for { "id": "0", "video": "/4T/WK/MyProjects/LLaVA-NeXT/ZH-DataSet/vidio-dataset/datasets--lmms-lab--VideoDetailCaption/Test_Videos/v_-6dz6tBH77I.mp4", "conversations": [ { "from": "human", "value": "

Sep 28 '24 02:09 haozhang1234

LLaVA-NeXT LLaVA-NeXT copied to clipboard

Regarding the issue of fine-tuning Llava onevision

LLaVA-NeXT
LLaVA-NeXT copied to clipboard