jiayi chen issues

Results 7 issues of


                                            jiayi chen

训练曲线异常

![8de2f3e55159f258f3925242e64bd6d](https://github.com/SmilingWolf/SW-CV-ModelZoo/assets/91857689/4d2751b3-a9f7-4db4-b526-7a1c26a8091a) 横坐标为step，纵坐标为F1; 我在作者huggingface上提供的Moat权重的基础上进行恢复训练。训练的时候第一个epoch正常，后面阶段性（几乎是以一个epoch为周期）下降会是什么原因

warnings about cache()

2024-01-24 10:40:27.562336: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the...

training time

What is the approximate time needed to train one epoch with 8 V100 GPUs (32GB each) on a dataset of 3.2 million images? How many epochs might be required to...

如何选择最优模型？

我在train.py中，看到间隔几个epoch保存模型，但是如何选择最优模型？是把保存下来的模型依次跑一遍test.py，选择最好的吗

ShareCaptioner is based on the improved InternLM-Xcomposer-7B base model.

从https://huggingface.co/Lin-Chen/ShareCaptioner, 我看到 "ShareCaptioner is based on the improved [InternLM-Xcomposer-7B](https://github.com/InternLM/InternLM-XComposer) base model". 我想知道ShareCaptioner的训练代码是否开源，如果没开源，相比于[InternLM-Xcomposer-7B，具体做了哪些修改呢？

反向传播时，梯度是如何计算的

![image](https://github.com/user-attachments/assets/1bebd2fb-ce80-49ab-afb8-d717ca0a14a9) 我是一个初学者，我想知道论文中，为什么说求E和L的偏导时，需要X关于W的偏导，能简单地给我一个推导的公式吗？非常感谢。

what is 1-shot / half-shot /quarter-shot constraint in experiments？

我还是无法理解。1-shot constraint代表the original token（包含一个示例），half-shot constraint指什么，半个示例？ _Originally posted by @21-10-4 in https://github.com/microsoft/LLMLingua/issues/164#issuecomment-2367944467_