AntonioSu

Results 5 issues of AntonioSu

## Other relevant information - **Command lined used (if not specified in steps to reproduce)**: python3.7 main.py train --training-data-src-dir workspace/data_src/aligned --training-data-dst-dir workspace/data_dst/aligned --model-dir workspace/model --model SAEHD --no-preview - **Operating system...

### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior RuntimeError: [4] is setting up NCCL communicator and retrieving ncclUniqueId from...

训练14B的模型,如果max_length"设置为 16000就会出问题,具体问题如下 错误如下: from torch.distributed.optim import \ /home/jeeves/.local/lib/python3.10/site-packages/swift/trainers/mixin.py:81: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Seq2SeqTrainer.__init__`. Use `processing_class` instead. super().__init__( [rank7]: Traceback (most recent call...

好像涉及到了循环调用 问题如下: (TemporaryActor pid=6490) cannot import name 'DeepSpeedTransformerInference' from partially initialized module 'deepspeed.model_implementations.transformers.ds_transformer' (most likely due to a circular import) (/usr/local/lib/python3.10/dist-packages/deepspeed/model_implementations/transformers/ds_transformer.py) (TemporaryActor pid=6490) Traceback (most recent call last): (TemporaryActor pid=6490)...

code 位置:openrlhf/trainer/ppo_utils/experience_maker.py @torch.no_grad() def make_experience(self, samples: Samples) -> Experience: raise NotImplementedError("This method should be implemented by the subclass.")