AlphaNext

Results 17 issues of AlphaNext

视频数据制作环节使用到的ShareGPT4V-Captioner-7B是哪个模型,有链接吗?是这个吗?https://huggingface.co/Lin-Chen/ShareCaptioner/tree/main

感谢你们团队的工作~~ * 请问如果组织自有数据?看到scripts/train_data/video_data_513.txt中的数据格式是: `视频路径,json文件` 可以给一个具体的示例吗?尤其是JSON文件里的内容 * 另外参与训练的视频样本必须得是固定的视频尺寸吗(如:512x512) * 微调时需要格外注意哪些参数呢?(学习率?)

### System Info / 系統信息 * 代码版本:CogVideo commit id 版本 354c906f8160084bbdf1f1c42b3b292d509fe24b * CUDA12.2,Torch2.4.0, GCC=11.x * 环境:从sat目录执行pip install -r requirements.txt * sat 下的sft微调 ### Information / 问题信息 - [ ] The...

### System Info / 系統信息 cuda11.8/torch2.4 ### Information / 问题信息 - [X] The official example scripts / 官方的示例脚本 - [ ] My own modified scripts / 我自己修改的脚本和任务 ### Reproduction /...

Nice work. The MM-DiT black has a concat operation between image modal and text modal before the Q K V Attention, emmmm I could not find it...... Look forward to...

Thanks for your great work. I'm using nyrahealth/CrisperWhisper model to transcribe audio to text with timestamps, but nyrahealth/CrisperWhisper model outputs word-level text with timestamps. How can I convert them to...

Thanks for your great work. I'm using nyrahealth/CrisperWhisper model to transcribe audio to text with timestamps, but nyrahealth/CrisperWhisper model outputs word-level text with timestamps. How can I convert them to...