nebuly icon indicating copy to clipboard operation
nebuly copied to clipboard

[Chatllama] error when load dataset when use deepspeed

Open bino282 opened this issue 1 year ago • 5 comments

hi, when I use deepspeed , I encountered this error: [2023-03-09 10:46:33,647] [INFO] [logging.py:77:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False Traceback (most recent call last): File "/datahdd/nhanv/Projects/NLP/chatllama/artifacts/main.py", line 50, in actor_trainer = ActorTrainer(config.actor) File "/home/ntq/miniconda3/envs/textgen/lib/python3.10/site-packages/chatllama/rlhf/actor.py", line 324, in init ) = deepspeed.initialize( File "/home/ntq/miniconda3/envs/textgen/lib/python3.10/site-packages/deepspeed/init.py", line 125, in initialize engine = DeepSpeedEngine(args=args, File "/home/ntq/miniconda3/envs/textgen/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 319, in init self.training_dataloader = self.deepspeed_io(training_data) File "/home/ntq/miniconda3/envs/textgen/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1674, in deepspeed_io raise ValueError("Training data must be a torch Dataset") ValueError: Training data must be a torch Dataset How can I fix it?

bino282 avatar Mar 09 '23 03:03 bino282

I got the bug too. Has any one debug.

Xuan-ZW avatar Mar 09 '23 09:03 Xuan-ZW

@bino282 thank you for reaching out. We know that currently we have some issue with DeepSpeed we already working to fix it. Could you please share with us your current setup?

PierpaoloSorbellini avatar Mar 09 '23 09:03 PierpaoloSorbellini

@PierpaoloSorbellini The setup as following: from pathlib import Path from setuptools import setup, find_packages

REQUIREMENTS = [ "beartype", "deepspeed", "einops", "fairscale", "langchain>=0.0.103", "torch", "tqdm", "transformers", "datasets", "openai", ]

this_directory = Path(file).parent long_description = (this_directory / "README.md").read_text(encoding="utf8")

setup( name="chatllama-py", version="0.0.2", packages=find_packages(), install_requires=REQUIREMENTS, long_description=long_description, include_package_data=True, long_description_content_type="text/markdown", )

Xuan-ZW avatar Mar 09 '23 10:03 Xuan-ZW

I was able to fix the problem "Training data must be a torch Dataset". The parameter training_data of deepspeed.initialize must be altered to training_data=self.train_dataset,. I changed it in actor.py and reward.py. Then deepspeed worked for me. Hopefully this information helps.

phste avatar Mar 10 '23 18:03 phste

Hi @phste @Xuan-ZW @bino282
With the PR #306 soon to be merged most of the deepspeed problems should have been addressed!

PierpaoloSorbellini avatar Apr 03 '23 14:04 PierpaoloSorbellini