guozhiyao
guozhiyao
I train the `swin_tiny_patch4_window7_224` with one million classes and 100 million images with softmax loss and adamw, the batch size is 600 and train for 400,000 iterations but the model...
Hi, I am a bit confused about the update process of `w`. In the paper, only the sum of `w` is constrained to be `task_num`, but it is not avoided...
I train the tiny model with one million classes and 100 million images with softmax loss and adamw, the batch size is 600 and train for 400,000 iterations but the...
**Describe the bug** I train the model with zero-2 for multi-node training, and save the model by `model.save_checkpoint`. When I want to get the state dict from `get_fp32_state_dict_from_zero_checkpoint`, it report...
**Describe the bug** I trained the gpt 13B model and used zero3, but it seems that the gpu usage will not decrease as the number of gpus increases? In addition,...
你好,请问你使用的tokenizer有没有什么special token呢?是不是只使用了``?训练时候在每个样本最后加上``表示结束,然后inference阶段保留``前的生成结果作为输出?那在多轮对话时,是否要保留历史对话的``?比如输入`[prompt1][ans1][prompt2]`,还是`[prompt1][ans1][prompt2]`?
Could you provide the training log of stage1 and stage2?
请问能否开源指令微调时的代码呢?想知道增加了哪些可训练参数,以及一些训练细节。万分感谢。