chongxiaoc

Results 37 comments of chongxiaoc

@serena-ruan PTL 1.6.3 changed data module hook function behavior, and broke horovod lightning estimator in validation step. I've tried a few work-around before but unfortunately, it didn't work out. Contribution...

Is there a simple reproducer you can provide? A simple toy model with dummy data for example?

Hi, I'm getting same issue when using deepspeed 0.10.0 with huggingface transformers. ``` 727AssertionError: Not enough buffers 0 for swapping 1726 assert len(swap_in_paths)

+1. Would like this feature to be supported.

same here. Model is OpenAssistant/reward-model-deberta-v3-large-v2

I added `deepspeed` config below but it still failed with same error above. ``` backend: type: ray trainer: use_gpu: true strategy: type: deepspeed zero_optimization: stage: 3 offload_optimizer: device: cpu pin_memory:...

Look like `class 'ludwig.trainers.trainer_llm.NoneTrainer'` is the root cause, which doesn't init distributed backend.