VITA icon indicating copy to clipboard operation
VITA copied to clipboard

显卡需求

Open xiaodongyichuan opened this issue 1 year ago • 13 comments

一张A100可以跑起来吗?

xiaodongyichuan avatar Oct 29 '24 08:10 xiaodongyichuan

跑interactive_demo不够

LiuMY13 avatar Oct 30 '24 03:10 LiuMY13

跑interactive_demo不够

四张A40够吗?

xiaodongyichuan avatar Oct 30 '24 03:10 xiaodongyichuan

I successfully run interactive demo in 4 A100 80G. And this is the resources I used. c322b5e00b09f3b30a37397fc46c692

LiuMY13 avatar Oct 30 '24 07:10 LiuMY13

hello,请教一下,是必须要自己训练一下才能使用demo吗?因为我看模型文件是缺少一些文件的,而且也跑不起来

xiaodongyichuan avatar Nov 05 '24 09:11 xiaodongyichuan

不需要训练,文件全,可以跑

LiuMY13 avatar Nov 05 '24 12:11 LiuMY13

I successfully run interactive demo in 4 A100 80G. And this is the resources I used. c322b5e00b09f3b30a37397fc46c692

Hi, I'm also running interactive demo, but both model_1 and model_2 are loaded to GPU 0,1 ( ,even I specified different GPU for model_2) which caused OOM. Have you ever met this problem?

CR400AF-A avatar Dec 11 '24 16:12 CR400AF-A

I successfully run interactive demo in 4 A100 80G. And this is the resources I used. c322b5e00b09f3b30a37397fc46c692

请问 代码修改,使得分配模型到4个gpu?,我也是两个80g oom

Coding-Zuo avatar Jan 07 '25 08:01 Coding-Zuo

好吧,我还以为我的 4070 Ti S 能跑呢,环境搞了半天,出现了错误:

  File "/home/qx/sourcecode/VITA/.venv/lib/python3.12/site-packages/accelerate/hooks.py", line 364, in pre_forward
    return send_to_device(args, self.execution_device), send_to_device(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/qx/sourcecode/VITA/.venv/lib/python3.12/site-packages/accelerate/utils/operations.py", line 174, in send_to_device
    return honor_type(
           ^^^^^^^^^^^
  File "/home/qx/sourcecode/VITA/.venv/lib/python3.12/site-packages/accelerate/utils/operations.py", line 81, in honor_type
    return type(obj)(generator)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/qx/sourcecode/VITA/.venv/lib/python3.12/site-packages/accelerate/utils/operations.py", line 175, in <genexpr>
    tensor, (send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys) for t in tensor)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/qx/sourcecode/VITA/.venv/lib/python3.12/site-packages/accelerate/utils/operations.py", line 155, in send_to_device
    return tensor.to(device, non_blocking=non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: Cannot copy out of meta tensor; no data!
make: *** [Makefile:4: text_query] Error 1

那只能先放弃了

navono avatar Jan 23 '25 06:01 navono

这个问题解决了吗?两张a800跑不起来…都加载在一张卡上

Glorainow avatar Mar 18 '25 08:03 Glorainow

这个问题解决了吗?两张a800跑不起来…都加载在一张卡上

2张卡应该是不够的,我之前是分在4卡上跑的,前两张卡跑一个model,后两张卡跑一个model

CR400AF-A avatar Mar 18 '25 09:03 CR400AF-A

这个问题解决了吗?两张a800跑不起来…都加载在一张卡上

2张卡应该是不够的,我之前是分在4卡上跑的,前两张卡跑一个model,后两张卡跑一个model

那在server.py里面是怎么设置device的?

Glorainow avatar Mar 19 '25 07:03 Glorainow

这个问题解决了吗?两张a800跑不起来…都加载在一张卡上

2张卡应该是不够的,我之前是分在4卡上跑的,前两张卡跑一个model,后两张卡跑一个model

那在server.py里面是怎么设置device的?

分别是0,1 和 2,3。我之前VITA好像是旧版本,所以是4张卡,现在的这个文件里看只需要两张? 我之前出现都加载在同样的卡上的原因是,我导入了其他包导致torch先被初始化了,但我不能肯定你遇到的错误是否一样。

CR400AF-A avatar Mar 20 '25 04:03 CR400AF-A

破案了,必须要在代码里设定,在外面无法指定。

Glorainow avatar Apr 23 '25 02:04 Glorainow