AlphaNext issues

Results 17 issues of


                                            AlphaNext

About ShareGPT4V-Captioner-7B

视频数据制作环节使用到的ShareGPT4V-Captioner-7B是哪个模型，有链接吗？是这个吗？https://huggingface.co/Lin-Chen/ShareCaptioner/tree/main

如何在自有数据上微调模型？

感谢你们团队的工作~~ * 请问如果组织自有数据？看到scripts/train_data/video_data_513.txt中的数据格式是： `视频路径,json文件` 可以给一个具体的示例吗？尤其是JSON文件里的内容 * 另外参与训练的视频样本必须得是固定的视频尺寸吗（如：512x512） * 微调时需要格外注意哪些参数呢？（学习率？）

RuntimeError: Error building extension 'fused_ema_adam'

### System Info / 系統信息 * 代码版本：CogVideo commit id 版本 354c906f8160084bbdf1f1c42b3b292d509fe24b * CUDA12.2，Torch2.4.0, GCC=11.x * 环境：从sat目录执行pip install -r requirements.txt * sat 下的sft微调 ### Information / 问题信息 - [ ] The...

loss is NAN when training some steps, sat sft type

### System Info / 系統信息 cuda11.8/torch2.4 ### Information / 问题信息 - [X] The official example scripts / 官方的示例脚本 - [ ] My own modified scripts / 我自己修改的脚本和任务 ### Reproduction /...

A little question about mmdit attention

Nice work. The MM-DiT black has a concat operation between image modal and text modal before the Q K V Attention, emmmm I could not find it...... Look forward to...

support sentence-level timestamps?

Thanks for your great work. I'm using nyrahealth/CrisperWhisper model to transcribe audio to text with timestamps, but nyrahealth/CrisperWhisper model outputs word-level text with timestamps. How can I convert them to...

AlphaNext