basedet icon indicating copy to clipboard operation
basedet copied to clipboard

训练报错

Open corleonechensiyu opened this issue 2 years ago • 1 comments

python版本:3.8 训练命令:basedet_train -f playground/examples/atss/config.py -n 4 报错日志如下

/home/csy/.local/lib/python3.8/site-packages/numpy/core/getlimits.py:518: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/csy/.local/lib/python3.8/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
/home/csy/.local/lib/python3.8/site-packages/numpy/core/getlimits.py:518: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/csy/.local/lib/python3.8/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
(py3.8) csy@hpc:~/megvii/basedet$ basedet_train -f playground/examples/atss/config.py -n 4
/home/csy/.local/lib/python3.8/site-packages/numpy/core/getlimits.py:518: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/csy/.local/lib/python3.8/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
/home/csy/.local/lib/python3.8/site-packages/numpy/core/getlimits.py:518: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/csy/.local/lib/python3.8/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
2023-08-26 11:59:26.656 | INFO     | basedet.tools.det_train:launch_workers:69 - Init process group for gpu3 done
2023-08-26 11:59:26.668 | INFO     | basedet.tools.det_train:launch_workers:69 - Init process group for gpu0 done
2023-08-26 11:59:26.668 | INFO     | basedet.tools.det_train:launch_workers:69 - Init process group for gpu1 done
2023-08-26 11:59:26.670 | INFO     | basedet.tools.det_train:launch_workers:69 - Init process group for gpu2 done
Process Process-2:
Traceback (most recent call last):
  File "/home/csy/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/csy/anaconda3/envs/py3.8/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/csy/anaconda3/envs/py3.8/lib/python3.8/site-packages/megengine/distributed/launcher.py", line 52, in _run_wrapped
    ret = func(*args, **kwargs)
  File "/home/csy/megvii/basedet/basedet/tools/det_train.py", line 88, in launch_workers
    setup_basedet_logger(log_path=cfg.GLOBAL.OUTPUT_DIR, to_loguru=True)
  File "/home/csy/megvii/basedet/basedet/utils/logger_utils.py", line 35, in setup_basedet_logger
    logger.add(
  File "/home/csy/anaconda3/envs/py3.8/lib/python3.8/site-packages/loguru/_logger.py", line 776, in add
    wrapped_sink = FileSink(path, **kwargs)
  File "/home/csy/anaconda3/envs/py3.8/lib/python3.8/site-packages/loguru/_file_sink.py", line 194, in __init__
    self._create_dirs(path)
  File "/home/csy/anaconda3/envs/py3.8/lib/python3.8/site-packages/loguru/_file_sink.py", line 226, in _create_dirs
    os.makedirs(dirname, exist_ok=True)
  File "/home/csy/anaconda3/envs/py3.8/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/home/csy/anaconda3/envs/py3.8/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/home/csy/anaconda3/envs/py3.8/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  [Previous line repeated 2 more times]
  File "/home/csy/anaconda3/envs/py3.8/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/data'
2023-08-26 11:59:27.541 | ERROR    | __main__:<module>:33 - An error has been caught in function '<module>', process 'MainProcess' (1554369), thread 'MainThread' (139692293293120):
Traceback (most recent call last):

> File "/home/csy/anaconda3/envs/py3.8/bin/basedet_train", line 33, in <module>
    sys.exit(load_entry_point('basedet', 'console_scripts', 'basedet_train')())
    │   │    └ <function importlib_load_entry_point at 0x7f0ca4627700>
    │   └ <built-in function exit>
    └ <module 'sys' (built-in)>

  File "/home/csy/megvii/basedet/basedet/tools/det_train.py", line 150, in main
    run()
    └ <function main.<locals>.run at 0x7f0b93e799d0>

  File "/home/csy/megvii/basedet/basedet/tools/det_train.py", line 139, in run
    train(args, cfg)
    │     │     └ ╒═════════╤════════════════════════════════════════════════════════════════════════════════╕
    │     │       │ keys    │ values              ...
    │     └ Namespace(amp=False, debug_mode=False, dir=None, dtr=False, ema=False, fastrun=False, file='playground/examples/atss/config.p...
    └ <megengine.distributed.launcher.launcher object at 0x7f0b93ef7be0>

  File "/home/csy/anaconda3/envs/py3.8/lib/python3.8/site-packages/megengine/distributed/launcher.py", line 149, in __call__
    assert (

AssertionError: subprocess 0 exit with code 1
Traceback (most recent call last):
  File "/home/csy/anaconda3/envs/py3.8/bin/basedet_train", line 33, in <module>
    sys.exit(load_entry_point('basedet', 'console_scripts', 'basedet_train')())
  File "/home/csy/anaconda3/envs/py3.8/lib/python3.8/site-packages/loguru/_logger.py", line 1251, in catch_wrapper
    return function(*args, **kwargs)
  File "/home/csy/megvii/basedet/basedet/tools/det_train.py", line 150, in main
    run()
  File "/home/csy/megvii/basedet/basedet/tools/det_train.py", line 139, in run
    train(args, cfg)
  File "/home/csy/anaconda3/envs/py3.8/lib/python3.8/site-packages/megengine/distributed/launcher.py", line 149, in __call__
    assert (
AssertionError: subprocess 0 exit with code 1

按照install.md安装的

corleonechensiyu avatar Aug 26 '23 04:08 corleonechensiyu

PermissionError: [Errno 13] Permission denied: '/data' Plz read your error log carefully and search the friendly web.

FateScript avatar Aug 28 '23 02:08 FateScript