xtuner train time decrease from 13 hours to 9

train time decrease from 13 hours to 9

Open mylesgoose opened this issue 6 months ago • 0 comments

Hello. I built a conda environemtn with these settings: `name: xtuner channels:

nvidia/label/cuda-12.4.0
pytorch
conda-forge dependencies:
python=3.11 # Specify Python version here
pytorch
torchvision
torchaudio
cuda
pytorch-cuda
compilers
sysroot_linux-64
gcc
ninja
py-cpuinfo
libaio
ca-certificates
certifi
openssl
pydantic
deepspeed
mpi4py
docutils
myst-parser
sphinx
sphinx-argparse
sphinx-book-theme
sphinx-copybutton
pip
pip:
- transformers>=4.44.2
- transformers_stream_generator
- sphinx_markdown_tables
- lagent
- bitsandbytes
- datasets
- einops
- mmengine
- openpyxl
- peft
- scikit-image
- scipy
- sentencepiece
- tiktoken And i downloaded the soruce code for xtuner and I remvoed all the lower requeitmenrs from it. fromtall the requiremtns text fie4ls and updated the setup.py if name == 'main': setup( name='xtuner', version=get_version(), description=('An efficient, flexible and full-featured toolkit for ' 'fine-tuning large models'), long_description=readme(), long_description_content_type='text/markdown', author='XTuner Contributors', author_email='[email protected]', keywords='large language model, parameter-efficient fine-tuning', url='https://github.com/InternLM/xtuner', packages=find_packages(), include_package_data=True, classifiers=[ 'Development Status :: 4 - Beta', 'License :: OSI Approved :: Apache Software License', 'Operating System :: OS Independent', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.8', 'Programming Language :: Python :: 3.9', 'Programming Language :: Python :: 3.10', 'Topic :: Utilities', ],
  Python maximum version <3.11, to support mpi4py-mpich
  python_requires='>=3.8, <3.12', license='Apache License 2.0', install_requires=parse_requirements('requirements/runtime.txt'), extras_require={ 'all': parse_requirements('requirements.txt'), 'deepspeed': parse_requirements('requirements/runtime.txt') + parse_requirements('requirements/deepspeed.txt'), 'modelscope': parse_requirements('requirements/runtime.txt') + parse_requirements('requirements/modelscope.txt'), }, zip_safe=False, entry_points={'console_scripts': ['xtuner = xtuner:cli']}) and i created a project.toml file[build-system] requires = ["setuptools >= 64.0", "wheel"] build-backend = "setuptools.build_meta" ` and then i ran the script NPROC_PER_NODE=6 xtuner train llava_llama3.1_8b_instruct_siglip-so400m-patch14-384_e1_gpu5_pretrainworkginforllam3google --deepspeed deepspeed_zero2 and it traiend in 9 hours. when i run your original python setup it takes 13 horus. on the same dataset. ther is still an issue tryign to run with deepspeed zero3. the xtuner code need sto be udpated for this. yet it trained well on zero2

Aug 23 '24 05:08 mylesgoose

xtuner xtuner copied to clipboard

train time decrease from 13 hours to 9

Python maximum version <3.11, to support mpi4py-mpich

xtuner
xtuner copied to clipboard