Chong Ruan
Chong Ruan
[This issue](https://github.com/pytorch/tutorials/issues/87) also mentioned this. @spro Please fix it quickly. This mistake in tutorial has been existing for 2 years.
@kimiyoung To make it clearer, let us walk through a concrete example: Assume the original sentence is 12345678, and the permutation is 12367845. The last two tokens, i.e.: 4 and...
> @RERV It seems as if swift does not support finetuning of the vision encoder (it seems that way from my quick glance over the source code, I hope I'm...
> @soloice Hi, I see, thanks. Would it be possible to just release the backprop code of the vision encoder, no framework around it, no clustering, just a starting point...
@Jintao-Huang Can you kindly confirm [if swift can be used to finetune visual encoder](https://github.com/deepseek-ai/DeepSeek-VL/issues/6#issuecomment-1992731315)? If so, how? If not, what's the simplest way to support it?
> > 1. 是全参数 > > 2. 如果是33B的话,一般需要80G显存,但通过pp并行(速度会慢),40G显存也是可以的 > > @guoday 你好, 我微调1.3B是两张显卡都跑到30G了, 想微调6.7B的时候显存直接爆了(batch_size=4都不行), 请问为什么消耗这么高, 好奇怪, 请教一下是要设置什么参数吗? 谢谢. 什么并行策略?
> Is this code "optimal" for batched inference and preprocessing? Nope. It's just a toy demo, not for production purpose.
> @lanking520 Thanks for your comment. We indeed use NCCL for cross-GPU tensor communication. However, in vLLM, we also need to pass several metadata ("control messages") from the scheduler to...
我们也发现了这个问题,正在努力解决
> > 我们也发现了这个问题,正在努力解决 > > 请问可能的原因有哪些呢?现在我用一些数据微调模型之后,几乎全部都是重复的字符 一般来说是训练不充分。另外如果 SFT 数据集规模太小也会出现这种情况。