MagicSource
MagicSource
从效果来看,还可以啊
不是可以直接作为reference吗,我还没有训练,还要自己训练么. 打标都是funasr
@wangqixun which model were using here?
whisper stt support mandrain. doesn't know about tts. I think tts would be a little bit harder.
@jpc multilangual tts is an ambitious goal, for Mandrain, TBH, there is no very good open dataset. Biaobei (Baker) could be used as experiment.

Can u also share pretraining script? Which tuning projector and vision encoder with stage 1 and stage 2? This not same as llava.
我跑的34b-vl,你这个是真炸了。 多测试几张图片,我发现Yi训练的这个多模态,在某些图片上容易炸
官方科比的图片是没有问题的,换一个其他的大概率炸,98%。 不知何故