vega
vega copied to clipboard
GPU配置
一机多卡怎么配置device啊 好像没看到相关介绍
@wangfeiyu-zerobug
将如下配置项配置为true后,使用本机所有GPU搜索模型:
parallel_search: True
若不希望使用所有的GPU,设置parallel_search: True
后,还需要设置环境变量CUDA_VISIBLE_DEVICES,如:export CUDA_VISIBLE_DEVICES=0,1,2,使用三块GPU。
似乎是并行计算库不能启动? INFO:root:------------------------------------------------ INFO:root: Step: serial INFO:root:------------------------------------------------ INFO:root:master ip and port: 127.0.0.1:28703 INFO:root:Initializing cluster. Please wait. INFO:root:Dask-scheduler not start. Start dask-scheduler in master 127.0.0.1 ERROR:vega.core.pipeline.pipeline:Failed to run pipeline, message: [Errno 2] No such file or directory: 'dask-scheduler': 'dask-scheduler' ERROR:vega.core.pipeline.pipeline:Traceback (most recent call last): File "/root/.local/lib/python3.7/site-packages/vega/core/pipeline/pipeline.py", line 84, in run pipestep = PipeStep(name=step_name) File "/root/.local/lib/python3.7/site-packages/vega/core/pipeline/search_pipe_step.py", line 45, in init self.master = create_master(update_func=self.generator.update) File "/root/.local/lib/python3.7/site-packages/vega/core/scheduler/master_ops.py", line 44, in create_master master_instance = Master(**kwargs) File "/root/.local/lib/python3.7/site-packages/vega/core/scheduler/master.py", line 65, in init status = self.dask_env.start() File "/root/.local/lib/python3.7/site-packages/vega/core/scheduler/dask_env.py", line 119, in start self._start_dask() File "/root/.local/lib/python3.7/site-packages/vega/core/scheduler/dask_env.py", line 155, in _start_dask scheduler_p = run_scheduler(ip=master_ip, port=master_port, tmp_file=scheduler_file) File "/root/.local/lib/python3.7/site-packages/vega/core/scheduler/run_dask.py", line 56, in run_scheduler env=os.environ File "/opt/conda/lib/python3.7/subprocess.py", line 800, in init restore_signals, start_new_session) File "/opt/conda/lib/python3.7/subprocess.py", line 1551, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'dask-scheduler': 'dask-scheduler'
pip install dask Requirement already satisfied: dask in /root/.local/lib/python3.7/site-packages (2022.2.0) Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.7/site-packages (from dask) (21.3) Requirement already satisfied: partd>=0.3.10 in /root/.local/lib/python3.7/site-packages (from dask) (1.3.0) Requirement already satisfied: toolz>=0.8.2 in /root/.local/lib/python3.7/site-packages (from dask) (0.12.0) Requirement already satisfied: fsspec>=0.6.0 in /root/.local/lib/python3.7/site-packages (from dask) (2022.8.2) Requirement already satisfied: cloudpickle>=1.1.1 in /opt/conda/lib/python3.7/site-packages (from dask) (2.1.0) Requirement already satisfied: pyyaml>=5.3.1 in /opt/conda/lib/python3.7/site-packages (from dask) (5.4.1) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging>=20.0->dask) (2.4.7) Requirement already satisfied: locket in /root/.local/lib/python3.7/site-packages (from partd>=0.3.10->dask) (1.0.0)
@wangfeiyu-zerobug
dask组件路径未生效。 两种办法,添加路径到$PATH中,或者重新登录到你所在系统,命令生效。 https://github.com/huawei-noah/vega/blob/master/docs/cn/user/faq.md#14-%E5%BC%82%E5%B8%B8-permissionerror-errno-13-permission-denied-dask-scheduler-filenotfounderror-errno-2-no-such-file-or-directory-dask-scheduler-dask-scheduler-%E6%88%96%E8%80%85-vega-command-not-found
已解决! 提个建议 :这个路径采用os.environ这个接口导入,我是在docker起的容器中运行的,所以这块解决方法可能不太一样,这块是不是可以做一些补充 另外 https://github.com/huawei-noah/vega/blob/master/docs/cn/user/config_reference.md#2-%E5%85%AC%E5%85%B1%E9%85%8D%E7%BD%AE%E9%A1%B9 -2-2.1中general的pytorch打错了
@wangfeiyu-zerobug
感谢你的建议。 请问你在容器中运行,解决方法是怎样的?
jupyter命令行:%env PATH=/root/.local/bin:
@wangfeiyu-zerobug
谢谢,我们及时刷新。
如果想利用已经搜索出的网络测试另一批数据,该怎么做呢?还需要通过pipline进行fulltrain嘛
是的,需要fullytrain,看下精度。
那是需要重新训练整个搜索好的网络? 目前没有提供yaml配置选项去调用模型参数只进行测试数据嘛
若搜索时的pipeline中包含了fullytrain,就不需要重新训练。 测试的代码可参考https://github.com/huawei-noah/vega/blob/39741b5ddd9623f0984599d7f52ea38ef6f253c1/vega/tools/inference.py
File "testcode.py", line 147, in
[[0.5647, 0.5647, 0.5647, ..., 0.3373, 0.3373, 0.3373],
[0.5647, 0.5647, 0.5647, ..., 0.3373, 0.3373, 0.3373],
[0.5647, 0.5647, 0.5647, ..., 0.3373, 0.3373, 0.3373],
...,
[0.8588, 0.8588, 0.8588, ..., 0.2902, 0.2902, 0.2902],
[0.8588, 0.8588, 0.8588, ..., 0.2902, 0.2902, 0.2902],
[0.8588, 0.8588, 0.8588, ..., 0.2902, 0.2902, 0.2902]],
[[0.6000, 0.6000, 0.6000, ..., 0.3333, 0.3333, 0.3333],
[0.6000, 0.6000, 0.6000, ..., 0.3333, 0.3333, 0.3333],
[0.6000, 0.6000, 0.6000, ..., 0.3333, 0.3333, 0.3333],
...,
[0.9216, 0.9216, 0.9216, ..., 0.2784, 0.2784, 0.2784],
[0.9216, 0.9216, 0.9216, ..., 0.2784, 0.2784, 0.2784],
[0.9216, 0.9216, 0.9216, ..., 0.2784, 0.2784, 0.2784]]])], [{'boxes': tensor([[190.9997, 410.9996, 250.9997, 470.9997],
[412.9995, 312.0001, 477.9995, 365.0001]]), 'labels': tensor([1, 1]), 'masks': tensor([[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]],
[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]]], dtype=torch.uint8), 'image_id': tensor([1]), 'area': tensor([3600.0029, 3445.0051]), 'iscrowd': tensor([0, 0])}]]
为什么dataload出来会是map呢?
数据格式是否是COCO格式的?
另外对于检测,需要参考这个代码:https://github.com/huawei-noah/vega/blob/39741b5ddd9623f0984599d7f52ea38ef6f253c1/vega/tools/detection_inference.py