DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[BUG]error: can't copy 'deepspeed/accelerator': doesn't exist or not a regular file

Open ucas010 opened this issue 2 years ago • 9 comments

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior: the official doc

git clone https://github.com/microsoft/DeepSpeed.git
cd DeepSpeed
pip install .

bug

      writing manifest file 'deepspeed.egg-info/SOURCES.txt'
      error: can't copy 'deepspeed/accelerator': doesn't exist or not a regular file
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for deepspeed
  Running setup.py clean for deepspeed
Failed to build deepspeed
Installing collected packages: deepspeed
  Running setup.py install for deepspeed ... error
  error: subprocess-exited-with-error
  
  × Running setup.py install for deepspeed did not run successfully.
  │ exit code: 1
  ╰─> [355 lines of output]
      DS_BUILD_OPS=0
      Install Ops={'async_io': False, 'cpu_adagrad': False, 'cpu_adam': False, 'fused_adam': False, 'fused_lamb': False, 'quantizer': False, 'random_ltd': False, 'sparse_attn': False, 'spatial_inference': False, 'transformer': False, 'stochastic_transformer': False, 'transformer_inference': False, 'utils': False}
      version=0.9.0+0b5252b, git_hash=0b5252b, git_branch=master
      install_requires=['hjson', 'ninja', 'numpy', 'packaging>=20.0', 'psutil', 'py-cpuinfo', 'pydantic', 'torch', 'tqdm']
      compatible_ops={'async_io': True, 'cpu_adagrad': True, 'cpu_adam': True, 'fused_adam': True, 'fused_lamb': True, 'quantizer': True, 'random_ltd': True, 'sparse_attn': True, 'spatial_inference': True, 'transformer': True, 'stochastic_transformer': True, 'transformer_inference': True, 'utils': True}
      ext_modules=[]

Expected behavior A clear and concise description of what you expected to happen.

ds_report output Please run ds_report to give us details about your setup.

Screenshots If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

  • OS: [centos]
  • GPU count and types [4 GPU]
  • Python version 3.9.16
  • Any other relevant info about your setup

ucas010 avatar Apr 13 '23 05:04 ucas010

I have solved it. Setup.py packages does not support soft links. You need to comment the following code in seup.py first:

  create_dir_symlink('..\\..\\csrc', '.\\deepspeed\\ops\\csrc')
  create_dir_symlink('..\\..\\op_builder', '.\\deepspeed\\ops\\op_builder')
  create_dir_symlink('..\\accelerator', '.\\deepspeed\\accelerator')

And then manually copy csrc, op_builder and accelerator to the corresponding directory.

liulhdarks avatar Apr 13 '23 11:04 liulhdarks

@ucas010 The bug is in setup.py. On line ~270+ where the setup() call is made, you have to add one more argument:

package_dir={"": "."},

so it should become:

setup(name='deepspeed',
      version=version_str,
      description='DeepSpeed library',
      long_description=readme_text,
      long_description_content_type='text/markdown',
      author='DeepSpeed Team',
      author_email='[email protected]',
      url='http://deepspeed.ai',
      project_urls={
          'Documentation': 'https://deepspeed.readthedocs.io',
          'Source': 'https://github.com/microsoft/DeepSpeed',
      },
      install_requires=install_requires,
      extras_require=extras_require,
      packages=find_packages(include=['deepspeed', 'deepspeed.*']),
      package_dir={"": "."},
      include_package_data=True,
      scripts=[
          'bin/deepspeed', 'bin/deepspeed.pt', 'bin/ds', 'bin/ds_ssh', 'bin/ds_report', 'bin/ds_bench', 'bin/dsr',
          'bin/ds_elastic'
      ],
      classifiers=[
          'Programming Language :: Python :: 3.6', 'Programming Language :: Python :: 3.7',
          'Programming Language :: Python :: 3.8', 'Programming Language :: Python :: 3.9',
          'Programming Language :: Python :: 3.10'
      ],
      license='MIT',
      ext_modules=ext_modules,
      cmdclass=cmdclass)

Also if you are running Windows and you encounter compile errors like error C2398: Element '2': conversion from 'size_t' to '_Ty' requires a narrowing conversion`

Consider going to: deepspeed\csrc\transformer\inference\csrc\pt_binding.cpp . There you have to make two typecasts:

On prev_key:


auto prev_key = torch::from_blob(workspace + offset,
                                     {bsz, heads, all_tokens, k},
                                     {hidden_dim * InferenceContext::Instance().GetMaxTokenLenght(),
                                      k * InferenceContext::Instance().GetMaxTokenLenght(),
                                      k,
                                      1},
                                     options);

to become:


auto prev_key = torch::from_blob(workspace + offset,
                                     {bsz, heads, all_tokens, k},
                                     {static_cast<int64_t>(hidden_dim * InferenceContext::Instance().GetMaxTokenLenght()),
                                      static_cast<int64_t>(k * InferenceContext::Instance().GetMaxTokenLenght()),
                                      k,
                                      1},
                                     options);              

Repeat the same typecast for prev_value. What the error means is that basically the second argument (the array), has value of size size_t which is uint64 while we expect int64. We cast all to int64 since int64's max positive value is pretty large and safe

ldilov avatar Apr 14 '23 01:04 ldilov

I'm seeing the same issue: error: can't copy 'deepspeed/accelerator': doesn't exist or not a regular file

And indeed: This is a symlink after https://github.com/microsoft/DeepSpeed/pull/2560 got merged: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/accelerator

So it doesn't even required the line in setup.py which creates the symlink, which now(?) is run on Windows only anyway. Hence removing this doesn't solve the issue.

It is furthermore complicated by this not always happening. Haven't fully verified but I suspect it only appears when the ~wheel ~ setuptools_scm package is installed, which isn't always the case.

The package_dir={"": "."}, line addition from @ldilov fixes this for me. However then the package_data (such as deepspeed/ops/csrc) is missing

By experimenting with a environment where it works and comparing to one where it doesn't I found the issue occurs when setuptools_scm is installed.

Flamefire avatar Jul 27 '23 14:07 Flamefire

I found the related bug #1909 and the solution there: https://github.com/microsoft/DeepSpeed/issues/1909#issuecomment-1225113348

Basically:

rm deepspeed/ops/{csrc,op_builder}
rm deepspeed/accelerator
cp -R csrc op_builder deepspeed/ops/
cp -R accelerator deepspeed/

And all works as far as I can tell.

I'd suggest to not use symlinks in this repo at all which will avoid this issue in the first place.

Or if you have too for development ease: Do the other way round: Create symlinks to the convenience places not where the files are actually required (i.e. reverse source and target)

Flamefire avatar Jul 28 '23 08:07 Flamefire

Currently working on #4323 to remove the symlinks and hopefully resolve this issue. Please try that PR if you are still seeing this error.

mrwyattii avatar Sep 13 '23 23:09 mrwyattii

I found the related bug #1909 and the solution there: #1909 (comment)

Basically:

rm deepspeed/ops/{csrc,op_builder}
rm deepspeed/accelerator
cp -R csrc op_builder deepspeed/ops/
cp -R accelerator deepspeed/

And all works as far as I can tell.

I'd suggest to not use symlinks in this repo at all which will avoid this issue in the first place.

Or if you have too for development ease: Do the other way round: Create symlinks to the convenience places not where the files are actually required (i.e. reverse source and target)

This work for me, thanks!

vTuanpham avatar Dec 17 '23 18:12 vTuanpham