AntonioSu issues

Results 5 issues of


                                            AntonioSu

multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown

## Other relevant information - **Command lined used (if not specified in steps to reproduce)**: python3.7 main.py train --training-data-src-dir workspace/data_src/aligned --training-data-dst-dir workspace/data_dst/aligned --model-dir workspace/model --model SAEHD --no-preview - **Operating system...

[BUG/Help] 使用deepspeed做全量finetune的时候，出现如下问题，Socket Timeout，单机多卡

### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior RuntimeError: [4] is setting up NCCL communicator and retrieving ncclUniqueId from...

AttributeError: 'NoneType' object has no attribute 'shape'

训练14B的模型，如果max_length"设置为 16000就会出问题，具体问题如下错误如下： from torch.distributed.optim import \ /home/jeeves/.local/lib/python3.10/site-packages/swift/trainers/mixin.py:81: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Seq2SeqTrainer.__init__`. Use `processing_class` instead. super().__init__( [rank7]: Traceback (most recent call...

most likely due to a circular import

好像涉及到了循环调用问题如下： [36m(TemporaryActor pid=6490)[0m cannot import name 'DeepSpeedTransformerInference' from partially initialized module 'deepspeed.model_implementations.transformers.ds_transformer' (most likely due to a circular import) (/usr/local/lib/python3.10/dist-packages/deepspeed/model_implementations/transformers/ds_transformer.py) [36m(TemporaryActor pid=6490)[0m Traceback (most recent call last): [36m(TemporaryActor pid=6490)[0m...

最新版不支持非Ray的方式了吗？

code 位置：openrlhf/trainer/ppo_utils/experience_maker.py @torch.no_grad() def make_experience(self, samples: Samples) -> Experience: raise NotImplementedError("This method should be implemented by the subclass.")