VITA 显卡需求

一张A100可以跑起来吗？

Oct 29 '24 08:10 xiaodongyichuan

跑interactive_demo不够

Oct 30 '24 03:10 LiuMY13

跑interactive_demo不够

四张A40够吗？

Oct 30 '24 03:10 xiaodongyichuan

I successfully run interactive demo in 4 A100 80G. And this is the resources I used. c322b5e00b09f3b30a37397fc46c692

Oct 30 '24 07:10 LiuMY13

hello，请教一下，是必须要自己训练一下才能使用demo吗?因为我看模型文件是缺少一些文件的，而且也跑不起来

Nov 05 '24 09:11 xiaodongyichuan

不需要训练，文件全，可以跑

Nov 05 '24 12:11 LiuMY13

I successfully run interactive demo in 4 A100 80G. And this is the resources I used.

Hi, I'm also running interactive demo, but both model_1 and model_2 are loaded to GPU 0,1 ( ,even I specified different GPU for model_2) which caused OOM. Have you ever met this problem?

Dec 11 '24 16:12 CR400AF-A

I successfully run interactive demo in 4 A100 80G. And this is the resources I used.

请问代码修改，使得分配模型到4个gpu？，我也是两个80g oom

Jan 07 '25 08:01 Coding-Zuo

好吧，我还以为我的 4070 Ti S 能跑呢，环境搞了半天，出现了错误：

  File "/home/qx/sourcecode/VITA/.venv/lib/python3.12/site-packages/accelerate/hooks.py", line 364, in pre_forward
    return send_to_device(args, self.execution_device), send_to_device(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/qx/sourcecode/VITA/.venv/lib/python3.12/site-packages/accelerate/utils/operations.py", line 174, in send_to_device
    return honor_type(
           ^^^^^^^^^^^
  File "/home/qx/sourcecode/VITA/.venv/lib/python3.12/site-packages/accelerate/utils/operations.py", line 81, in honor_type
    return type(obj)(generator)
           ^^^^^^^^^^^^^^^^^^^^
  File "/home/qx/sourcecode/VITA/.venv/lib/python3.12/site-packages/accelerate/utils/operations.py", line 175, in <genexpr>
    tensor, (send_to_device(t, device, non_blocking=non_blocking, skip_keys=skip_keys) for t in tensor)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/qx/sourcecode/VITA/.venv/lib/python3.12/site-packages/accelerate/utils/operations.py", line 155, in send_to_device
    return tensor.to(device, non_blocking=non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: Cannot copy out of meta tensor; no data!
make: *** [Makefile:4: text_query] Error 1

那只能先放弃了

Jan 23 '25 06:01 navono

这个问题解决了吗？两张a800跑不起来…都加载在一张卡上

Mar 18 '25 08:03 Glorainow

这个问题解决了吗？两张a800跑不起来…都加载在一张卡上

2张卡应该是不够的，我之前是分在4卡上跑的，前两张卡跑一个model，后两张卡跑一个model

Mar 18 '25 09:03 CR400AF-A

这个问题解决了吗？两张a800跑不起来…都加载在一张卡上

2张卡应该是不够的，我之前是分在4卡上跑的，前两张卡跑一个model，后两张卡跑一个model

那在server.py里面是怎么设置device的？

Mar 19 '25 07:03 Glorainow

这个问题解决了吗？两张a800跑不起来…都加载在一张卡上

2张卡应该是不够的，我之前是分在4卡上跑的，前两张卡跑一个model，后两张卡跑一个model

那在server.py里面是怎么设置device的？

分别是0,1 和 2,3。我之前VITA好像是旧版本，所以是4张卡，现在的这个文件里看只需要两张？我之前出现都加载在同样的卡上的原因是，我导入了其他包导致torch先被初始化了，但我不能肯定你遇到的错误是否一样。

Mar 20 '25 04:03 CR400AF-A

破案了，必须要在代码里设定，在外面无法指定。

Apr 23 '25 02:04 Glorainow