CSGHub icon indicating copy to clipboard operation
CSGHub copied to clipboard

Create a new fine-tuning instance, how to use local GPUs resources

Open tianj0522 opened this issue 1 year ago • 10 comments

image

tianj0522 avatar Nov 19 '24 02:11 tianj0522

version: v1.0.0

tianj0522 avatar Nov 19 '24 02:11 tianj0522

suppose you have managed GPUs with k8s, please copy the kube config file to .kube folder of service csghub_server_runner.

.kube folder mounted in docker compose yaml config:

  csghub_server_runner:
   
...
    volumes:
      - ./.kube:/root/.kube:r

Rader avatar Nov 19 '24 07:11 Rader

resource list are read from table space_resources, change the config according to your real GPU instances.

Rader avatar Nov 19 '24 07:11 Rader

资源列表从表中读取space_resources,根据您的实际 GPU 实例更改配置。

Are there any standards for modification? For example, the GPU instance used in my k8s cluster is NVIDIA-GeForce-RTX-4070 4 cards, which are managed uniformly using gpu-operator. After adding relevant information, it still cannot be selected. The space_resources table configuration is shown in the figure below image image

tianj0522 avatar Nov 19 '24 10:11 tianj0522

资源列表从表中读取,根据您的实际 GPU 实例更改配置。space_resources

是否有任何修改标准?例如,我的 k8s 集群中使用的 GPU 实例是 NVIDIA-GeForce-RTX-4070 4 卡,使用 gpu-operator 统一管理。添加相关信息后,仍然无法选择。space_resources表配置如下图所示 image image

I'm also facing this issue. It seems like modifying SQL alone won't work. Have you solved this problem yet

jksj-223 avatar Jan 20 '25 09:01 jksj-223

gitlab-issue-345

jialudev avatar Feb 17 '25 05:02 jialudev

@ganisback do you have any suggestion?

Rader avatar Mar 24 '25 08:03 Rader

@jksj-223 @tianj0522 please reduce the number of vcpu, maybe the cpu is not enough to run.

ganisback avatar Mar 24 '25 08:03 ganisback

@tianj0522 I’m having the same issue. Did you eventually solve it?

togettoyou avatar Nov 25 '25 06:11 togettoyou

@tianj0522 I’m having the same issue. Did you eventually solve it?

make sure you have followed this example:

Image

Rader avatar Nov 25 '25 08:11 Rader