ollama icon indicating copy to clipboard operation
ollama copied to clipboard

Add support for MIG mode detection and use

Open waTeim opened this issue 2 years ago • 17 comments

The issue here is that when the startup code checks for the capabilities of the GPU so it can allocate resources (in particular memory), it mistakenly uses the host GPU for its check rather than the MIG instance. This PR modifies the algorithm of cuda GPU detection. Essentially for each host GPU, check it that GPU supports MIG and if MIG is enabled, and if yes then iterate over all MIG instances. This results in a deviceMAP

typedef struct {
  unsigned numDevices;
  nvmlDevice_t **layout;
} deviceMap_t;

Later, that map can be iterated over. layout[i][0] is a pointer to the ith host GPU. layout[i][j + 1] will is the jth MIG instance of host GPU i. A value of (void*)0 marks the end of the MIG instance list. There can only be 7 total MIG instances per host GPU, so the size of the pointer array for each host is set to 9. Both cuda_check_vram and cuda_compute_capability were updated to use this new data structure.

MIG-related API calls were added to enable this see multi GPU management for details

Addresses #1500

waTeim avatar Jan 30 '24 03:01 waTeim

Ok I was wrong about only 1 MIG instance per pod, expect an update to include support for multiple

waTeim avatar Jan 30 '24 19:01 waTeim

Reworked MIG detection. Allows for multiple host and MIG instances. Some API calls only work on the hosts, tested for that. Saved it all in a deviceMap and saved that too statically. Looks like it computes the right answer. Also added some comments.

Example:

[0] CUDA device name: NVIDIA A100-PCIE-40GB MIG 1g.5gb
[0] CUDA part number: 900-21001-0100-030
[0] CUDA S/N: 1565020012855
[0] CUDA vbios version: 92.00.25.00.08
[0] CUDA brand: 14
[0] CUDA totalMem 5100273664
[0] CUDA freeMem 5087100928
[1] CUDA device name: NVIDIA A100-PCIE-40GB MIG 1g.5gb 
[1] CUDA part number: 900-21001-0100-030
[1] CUDA S/N: 1565020012461
[1] CUDA vbios version: 92.00.25.00.08
[1] CUDA brand: 14
[1] CUDA totalMem 5100273664
[1] CUDA freeMem 5087100928
[2] CUDA device name: NVIDIA A100-PCIE-40GB MIG 1g.5gb
[2] CUDA part number: 900-21001-0100-030.  
[2] CUDA S/N: 1565020012461
[2] CUDA vbios version: 92.00.25.00.08
[2] CUDA brand: 14
[2] CUDA totalMem 5100273664
[2] CUDA freeMem 5087100928
time=2024-02-02T02:04:32.335Z level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:146 msg="CUDA Compute Capability detected: 8.0"
time=2024-02-02T02:04:32.335Z level=DEBUG source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:231 msg="cuda detected 3 devices with 11482M available memory"

waTeim avatar Feb 02 '24 02:02 waTeim

Hello Guys,

We've encountered an issue while attempting to build and run [ollama serve] on our NVIDIA A100 GPUs with Multi-Instance GPU (MIG) mode enabled. The process detects the GPU and halts unexpectedly after the GPU detection phase.

Environment Details:

GPU Type: NVIDIA A100 MIG Configuration: 7g.80 CUDA Version: 12.3.1 Nvidia Driver:535.154.05 Operating System: Ubuntu22.04 GoLang-version: 1.21

Steps to Reproduce:

1.) I cloned the repo waTeim/ollama 2.) go generate ./... 3.) go build . 4.) mv ollama /usr/local/bin/ollama 5.) ollama serve

Expected behavior: ollama should start up and I should get to the ollama promt Actual behavior: ollama starts up and halts after gpu detection Attached Screenshot:

ollama_startup

Could you please help us identify the cause of this issue and suggest any potential fixes or workarounds? Any assistance or insight would be greatly appreciated.

Thank you for your time and support.

Best regards, calimero

calimero1337 avatar Feb 27 '24 10:02 calimero1337

can you run instead?

DEBUG=1 ollama serve

Or equivalent to get it into debug mode?

Also how many MIG instances, host GPUs?

waTeim avatar Feb 27 '24 19:02 waTeim

We are currently utilizing a virtual server setup configured with 2x NVIDIA A100 80 GB GPUs for testing purposes. Our aim is to experiment with various MIG (Multi-Instance GPU) combinations and models to optimize our workload. In the current scenario, we've allocated one entire GPU with MIG capabilities, hence the reference to "7g.80," which signifies the utilization of a full A100 GPU partitioned into 7 instances. As you can see, I tried to improve the logging to see a bit more what the gpu functions are getting back from the migged GPU.

[image: image.png]

Am Di., 27. Feb. 2024 um 20:42 Uhr schrieb Jeff Waller < @.***>:

can you run instead?

DEBUG=1 ollama serve

Or equivalent to get it into debug mode?

Also how many MIG instances, host GPUs?

— Reply to this email directly, view it on GitHub https://github.com/ollama/ollama/pull/2264#issuecomment-1967469788, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMWJSAIM6TKOQDO2Y7S2Z6TYVYZJ5AVCNFSM6AAAAABCQQ4OAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRXGQ3DSNZYHA . You are receiving this because you commented.Message ID: @.***>

calimero1337 avatar Feb 28 '24 07:02 calimero1337

hmm I see only the text [image: image.png] or were you referring to the previous message? Could you cut-paste the text instead of a screenshot? Also, if you indent the text 4 spaces, then github will auto-format it nicely. Also-also, I don't think a 7g80 is 1 host GPU broken into 7 MIG instances, rather it refers to 1 MIG instance that uses 7 GPU engines and 80G. If 7 instances, then it would look like a list of 7 1g-10 (or somesuch).

waTeim avatar Feb 28 '24 15:02 waTeim

@.***:/workspace/tewei/ollama# DEBUG=1 ollama serve 2024/02/29 08:54:01 images.go:853: INFO total blobs: 0 2024/02/29 08:54:01 images.go:860: INFO total unused blobs removed: 0 [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

  • using env: export GIN_MODE=release
  • using code: gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers) [GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers) [GIN-debug] POST /api/chat --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers) [GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers) [GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers) [GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers) [GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers) [GIN-debug] GET / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] GET /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) [GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] HEAD /api/version --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers) 2024/02/29 08:54:01 routes.go:970: INFO Listening on 127.0.0.1:11434 (version 0.0.0) 2024/02/29 08:54:01 payload_common.go:109: INFO Extracting dynamic libraries... 2024/02/29 08:54:07 payload_common.go:148: INFO Dynamic LLM libraries [cuda_v12 cpu_avx cpu cpu_avx2] 2024/02/29 08:54:07 gpu.go:231: INFO CheckVRAM started 2024/02/29 08:54:07 gpu.go:116: INFO GetGPUInfo started 2024/02/29 08:54:07 gpu.go:66: INFO initGPUHandles started 2024/02/29 08:54:07 gpu.go:93: INFO Detecting GPU type 2024/02/29 08:54:07 gpu.go:249: INFO FindGPULibs started 2024/02/29 08:54:07 gpu.go:253: INFO Searching for GPU management library libnvidia-ml.so 2024/02/29 08:54:07 gpu.go:299: INFO Discovered GPU libraries: [/usr/lib64/libnvidia-ml.so.535.154.05] 2024/02/29 08:54:07 gpu.go:304: INFO LoadCUDAMgmt started 2024/02/29 08:54:07 gpu.go:339: INFO getVerboseState started 2024/02/29 08:54:07 gpu.go:98: INFO Nvidia GPU detected 2024/02/29 08:54:07 cpu_common.go:11: INFO CPU has AVX2 2024/02/29 08:54:07 gpu.go:143: INFO CUDA GPU VRAM Total: 84987740160 bytes 2024/02/29 08:54:07 gpu.go:144: INFO CUDA GPU VRAM Free: 84985643008 bytes 2024/02/29 08:54:07 gpu.go:154: INFO CUDA Compute Capability detected: 8.0 (Minimum required: 5)

Am Mi., 28. Feb. 2024 um 16:45 Uhr schrieb Jeff Waller < @.***>:

hmm I see only the text [image: image.png] or were you referring to the previous message? Could you cut-paste the text instead of a screenshot? Also, if you indent the text 4 spaces, then github will auto-format it nicely.

— Reply to this email directly, view it on GitHub https://github.com/ollama/ollama/pull/2264#issuecomment-1969266624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMWJSAISV7WAS73DX3RS533YV5GJJAVCNFSM6AAAAABCQQ4OAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRZGI3DMNRSGQ . You are receiving this because you commented.Message ID: @.***>

calimero1337 avatar Feb 29 '24 08:02 calimero1337

Ok first, sorry to mislead you, it's not DEBUG=1 it's OLLAMA_DEBUG=1

What I'm looking for is something like this:

MIG Mode is 1
MIG Device Intance 0:0 found
[. ] CUDA device name: NVIDIA A100-PCIE-40GB MIG 3g.20gb
[0] CUDA part number: 900-21001-0100-030
[0] CUDA S/N: 1565020014726
[0] CUDA vbios version: 92.00.25.00.08
[0] CUDA brand: 14
[0] CUDA totalMem 20937965568
[0] CUDA freeMem 20898709504

and also this:

time=2024-03-02T02:05:08.846Z level=INFO source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:146 msg="CUDA Compute Capability detected: 8.0"
time=2024-03-02T02:05:08.846Z level=DEBUG source=/go/src/github.com/jmorganca/ollama/gpu/gpu.go:231 msg="cuda detected 1 devices with 17937M available memory"

If the total memory is correctly returned, that's as far as what I've written goes, there might be yet more stuff later in another part of the code base that's causing an error, but that's beyond what this pull request does.

waTeim avatar Mar 02 '24 02:03 waTeim

HI Jeff,

please find de debugged code attached

@.***:/workspace/tewei/ollama# OLLAMA_DEBUG=1 ollama serve time=2024-03-04T07:13:49.510Z level=DEBUG source=/workspace/tewei/ollama/server/routes.go:946 msg="Debug logging enabled" time=2024-03-04T07:13:49.511Z level=INFO source=/workspace/tewei/ollama/server/images.go:853 msg="total blobs: 0" time=2024-03-04T07:13:49.511Z level=INFO source=/workspace/tewei/ollama/server/images.go:860 msg="total unused blobs removed: 0" [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

  • using env: export GIN_MODE=release
  • using code: gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers) [GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers) [GIN-debug] POST /api/chat --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers) [GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers) [GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers) [GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers) [GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers) [GIN-debug] GET / --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] GET /api/version --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func3 (5 handlers) [GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] HEAD /api/version --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func3 (5 handlers) time=2024-03-04T07:13:49.512Z level=INFO source=/workspace/tewei/ollama/server/routes.go:970 msg="Listening on 127.0.0.1:11434 (version 0.0.0)" time=2024-03-04T07:13:49.512Z level=INFO source=/workspace/tewei/ollama/llm/payload_common.go:109 msg="Extracting dynamic libraries..." time=2024-03-04T07:13:55.302Z level=INFO source=/workspace/tewei/ollama/llm/payload_common.go:148 msg="Dynamic LLM libraries [cpu_avx2 cpu_avx cpu cuda_v12]" time=2024-03-04T07:13:55.302Z level=DEBUG source=/workspace/tewei/ollama/llm/payload_common.go:149 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-03-04T07:13:55.302Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:236 msg="CheckVRAM started" time=2024-03-04T07:13:55.302Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:120 msg="GetGPUInfo started" time=2024-03-04T07:13:55.302Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:65 msg="initGPUHandles started" time=2024-03-04T07:13:55.302Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:95 msg="Detecting GPU type" time=2024-03-04T07:13:55.302Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:253 msg="FindGPULibs started" time=2024-03-04T07:13:55.302Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:254 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-04T07:13:55.302Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:258 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-04T07:13:55.302Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:265 msg="linux_ldPaths: " !BADKEY="[/usr/local/lib/python3.10/dist-packages/torch/lib /usr/local/lib/python3.10/dist-packages/torch_tensorrt/lib /usr/local/cuda/compat/lib /usr/local/nvidia/lib /usr/local/nvidia/lib64]" time=2024-03-04T07:13:55.302Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:271 msg="Library search paths after environment processing: [/usr/local/lib/python3.10/dist-packages/torch/lib /usr/local/lib/python3.10/dist-packages/torch_tensorrt/lib /usr/local/cuda/compat/lib /usr/local/nvidia/lib /usr/local/nvidia/lib64]" time=2024-03-04T07:13:55.302Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:279 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so /usr/lib/x86_64-linux-gnu/libnvidia-ml.so /usr/lib/wsl/lib/libnvidia-ml.so /usr/lib/wsl/drivers//libnvidia-ml.so /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /usr/local/lib/python3.10/dist-packages/torch/lib/libnvidia-ml.so* /usr/local/lib/python3.10/dist-packages/torch_tensorrt/lib/libnvidia-ml.so* /usr/local/cuda/compat/lib/libnvidia-ml.so* /usr/local/nvidia/lib/libnvidia-ml.so* /usr/local/nvidia/lib64/libnvidia-ml.so*]" time=2024-03-04T07:13:55.302Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/local/cuda/lib64/libnvidia-ml.so*" time=2024-03-04T07:13:55.303Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so*" time=2024-03-04T07:13:55.303Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so*" time=2024-03-04T07:13:55.304Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/lib/wsl/lib/libnvidia-ml.so*" time=2024-03-04T07:13:55.304Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/lib/wsl/drivers//libnvidia-ml.so" time=2024-03-04T07:13:55.304Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /opt/cuda/lib64/libnvidia-ml.so*" time=2024-03-04T07:13:55.304Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/lib*/libnvidia-ml.so*" time=2024-03-04T07:13:55.305Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:287 msg="Match found: /usr/lib64/libnvidia-ml.so.1" time=2024-03-04T07:13:55.305Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:287 msg="Match found: /usr/lib64/libnvidia-ml.so.535.154.05" time=2024-03-04T07:13:55.305Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/local/lib*/libnvidia-ml.so*" time=2024-03-04T07:13:55.306Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so*" time=2024-03-04T07:13:55.306Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/lib/aarch64-linux-gnu/libnvidia-ml.so*" time=2024-03-04T07:13:55.306Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so*" time=2024-03-04T07:13:55.306Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/local/lib/python3.10/dist-packages/torch/lib/libnvidia-ml.so*" time=2024-03-04T07:13:55.306Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/local/lib/python3.10/dist-packages/torch_tensorrt/lib/libnvidia-ml.so*" time=2024-03-04T07:13:55.306Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/local/cuda/compat/lib/libnvidia-ml.so*" time=2024-03-04T07:13:55.306Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/local/nvidia/lib/libnvidia-ml.so*" time=2024-03-04T07:13:55.306Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/local/nvidia/lib64/libnvidia-ml.so*" time=2024-03-04T07:13:55.306Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:309 msg="Discovered GPU libraries: [/usr/lib64/libnvidia-ml.so.535.154.05]" time=2024-03-04T07:13:55.306Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:314 msg="LoadCUDAMgmt started" time=2024-03-04T07:13:55.306Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:351 msg="getVerboseState started" time=2024-03-04T07:13:55.306Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:318 msg="Attempting to load CUDA library: /usr/lib64/libnvidia-ml.so.535.154.05" wiring nvidia management library functions in /usr/lib64/libnvidia-ml.so.535.154.05 dlsym: nvmlInit_v2 dlsym: nvmlShutdown dlsym: nvmlDeviceGetHandleByIndex dlsym: nvmlDeviceGetMemoryInfo dlsym: nvmlDeviceGetCount_v2 dlsym: nvmlDeviceGetCudaComputeCapability dlsym: nvmlSystemGetDriverVersion dlsym: nvmlDeviceGetName dlsym: nvmlDeviceGetSerial dlsym: nvmlDeviceGetVbiosVersion dlsym: nvmlDeviceGetBoardPartNumber dlsym: nvmlDeviceGetBrand dlsym: nvmlDeviceGetMigMode dlsym: nvmlDeviceGetMigDeviceHandleByIndex dlsym: nvmlDeviceGetDeviceHandleFromMigDeviceHandle CUDA driver version: 535.154.05 time=2024-03-04T07:13:55.314Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:326 msg="CUDA management library loaded successfully." time=2024-03-04T07:13:55.314Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:101 msg="Nvidia GPU detected" time=2024-03-04T07:13:55.314Z level=INFO source=/workspace/tewei/ollama/gpu/cpu_common.go:11 msg="CPU has AVX2" MIG Mode is 1 MIG Device Intance 0:0 found [0] CUDA device name: NVIDIA A100-SXM4-80GB MIG 7g.80gb [0] CUDA part number: 692-2G506-0210-002 [0] CUDA S/N: 1652722008440 [0] CUDA vbios version: 92.00.45.00.05 [0] CUDA brand: 14 [0] CUDA totalMem 84987740160 [0] CUDA freeMem 83071860736 time=2024-03-04T07:13:55.335Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:147 msg="CUDA GPU VRAM Total: 84987740160 bytes" time=2024-03-04T07:13:55.335Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:148 msg="CUDA GPU VRAM Free: 83071860736 bytes" time=2024-03-04T07:13:55.335Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:158 msg="CUDA Compute Capability detected: 8.0 (Minimum required: 5)" time=2024-03-04T07:13:55.335Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:217 msg="GetGPUInfo completed with Library: cuda, Variant: " time=2024-03-04T07:13:55.335Z level=DEBUG source=/workspace/tewei/ollama/gpu/gpu.go:246 msg="cuda detected 1 devices with 71301M available memory"

Am Sa., 2. März 2024 um 03:07 Uhr schrieb Jeff Waller < @.***>:

Ok first, sorry to mislead you, it's not DEBUG=1 it's OLLAMA_DEBUG=1

— Reply to this email directly, view it on GitHub https://github.com/ollama/ollama/pull/2264#issuecomment-1974193893, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMWJSAO23PED4B66NHXV3GDYWEX57AVCNFSM6AAAAABCQQ4OAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZUGE4TGOBZGM . You are receiving this because you commented.Message ID: @.***>

calimero1337 avatar Mar 04 '24 07:03 calimero1337

ok yea, this PR covers this:

MIG Mode is 1
MIG Device Intance 0:0 found
[0] CUDA device name: NVIDIA A100-SXM4-80GB MIG 7g.80gb
[0] CUDA part number: 692-2G506-0210-002
[0] CUDA S/N: 1652722008440
[0] CUDA vbios version: 92.00.45.00.05
[0] CUDA brand: 14
[0] CUDA totalMem 84987740160
[0] CUDA freeMem 83071860736

and later

source=/workspace/tewei/ollama/gpu/gpu.go:246 msg="cuda detected 1 devices with 71301M available memory"

Anything that happens after that is not something that this PR affects directly, so I don't disbelieve you are having problems, but that's another part of the ollama code.

Possibly what is going on is this PR makes it possible to reach that code now which has a bug that I am not encountering with my debugging.

For comparison, I am also using an A100, but it's 40G, and the MIG instance is only 20G in size, so possibly what is being uncovered is memory size-related. I wonder since 80G is rare, there is some part of the code that makes an assumption and allocates something too small?

Maybe we should merge this but immediately turn to your issue?

And/or can you see what happens if you define a 20G or 40G MIG instance?

waTeim avatar Mar 05 '24 04:03 waTeim

Hi Jeff,

It took a little longer, here is the debug messages with a 40 GB mig GPU instance. Its halts again at the same point.

many thanks

@.***:/workspace# nvidia-smi Thu Mar 7 09:46:46 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.3 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA A100-SXM4-80GB On | 00000000:13:00.0 Off | Off | | N/A 30C P0 57W / 400W | N/A | N/A Default | | | | Enabled | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | MIG devices: | +------------------+--------------------------------+-----------+-----------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG | | | | ECC| | |==================+================================+===========+=======================| | 0 1 0 0 | 37MiB / 40192MiB | 42 N/A | 3 0 2 0 0 | | | 0MiB / 65535MiB | | | +------------------+--------------------------------+-----------+-----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+

@.***:/workspace# OLLAMA_DEBUG=1 ollama serve time=2024-03-07T09:48:08.102Z level=DEBUG source=/workspace/tewei/ollama/server/routes.go:946 msg="Debug logging enabled" time=2024-03-07T09:48:08.103Z level=INFO source=/workspace/tewei/ollama/server/images.go:853 msg="total blobs: 0" time=2024-03-07T09:48:08.103Z level=INFO source=/workspace/tewei/ollama/server/images.go:860 msg="total unused blobs removed: 0" [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.

  • using env: export GIN_MODE=release
  • using code: gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers) [GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers) [GIN-debug] POST /api/chat --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers) [GIN-debug] POST /api/embeddings --> github.com/jmorganca/ollama/server.EmbeddingHandler (5 handlers) [GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers) [GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers) [GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers) [GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers) [GIN-debug] POST /api/show --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers) [GIN-debug] POST /api/blobs/:digest --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers) [GIN-debug] HEAD /api/blobs/:digest --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers) [GIN-debug] GET / --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] GET /api/version --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func3 (5 handlers) [GIN-debug] HEAD / --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func2 (5 handlers) [GIN-debug] HEAD /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers) [GIN-debug] HEAD /api/version --> github.com/jmorganca/ollama/server.(Server).GenerateRoutes.func3 (5 handlers) time=2024-03-07T09:48:08.104Z level=INFO source=/workspace/tewei/ollama/server/routes.go:970 msg="Listening on 127.0.0.1:11434 (version 0.0.0)" time=2024-03-07T09:48:08.104Z level=INFO source=/workspace/tewei/ollama/llm/payload_common.go:109 msg="Extracting dynamic libraries..." time=2024-03-07T09:48:13.944Z level=INFO source=/workspace/tewei/ollama/llm/payload_common.go:148 msg="Dynamic LLM libraries [cuda_v12 cpu_avx2 cpu cpu_avx]" time=2024-03-07T09:48:13.944Z level=DEBUG source=/workspace/tewei/ollama/llm/payload_common.go:149 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY" time=2024-03-07T09:48:13.944Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:236 msg="CheckVRAM started" time=2024-03-07T09:48:13.944Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:120 msg="GetGPUInfo started" time=2024-03-07T09:48:13.944Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:65 msg="initGPUHandles started" time=2024-03-07T09:48:13.944Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:95 msg="Detecting GPU type" time=2024-03-07T09:48:13.944Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:253 msg="FindGPULibs started" time=2024-03-07T09:48:13.944Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:254 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-07T09:48:13.944Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:258 msg="Searching for GPU management library libnvidia-ml.so" time=2024-03-07T09:48:13.944Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:265 msg="linux_ldPaths: " !BADKEY="[/usr/local/lib/python3.10/dist-packages/torch/lib /usr/local/lib/python3.10/dist-packages/torch_tensorrt/lib /usr/local/cuda/compat/lib /usr/local/nvidia/lib /usr/local/nvidia/lib64]" time=2024-03-07T09:48:13.944Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:271 msg="Library search paths after environment processing: [/usr/local/lib/python3.10/dist-packages/torch/lib /usr/local/lib/python3.10/dist-packages/torch_tensorrt/lib /usr/local/cuda/compat/lib /usr/local/nvidia/lib /usr/local/nvidia/lib64]" time=2024-03-07T09:48:13.944Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:279 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so /usr/lib/x86_64-linux-gnu/libnvidia-ml.so /usr/lib/wsl/lib/libnvidia-ml.so /usr/lib/wsl/drivers//libnvidia-ml.so /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /usr/local/lib/python3.10/dist-packages/torch/lib/libnvidia-ml.so* /usr/local/lib/python3.10/dist-packages/torch_tensorrt/lib/libnvidia-ml.so* /usr/local/cuda/compat/lib/libnvidia-ml.so* /usr/local/nvidia/lib/libnvidia-ml.so* /usr/local/nvidia/lib64/libnvidia-ml.so*]" time=2024-03-07T09:48:13.944Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/local/cuda/lib64/libnvidia-ml.so*" time=2024-03-07T09:48:13.944Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so*" time=2024-03-07T09:48:13.945Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so*" time=2024-03-07T09:48:13.946Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/lib/wsl/lib/libnvidia-ml.so*" time=2024-03-07T09:48:13.946Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/lib/wsl/drivers//libnvidia-ml.so" time=2024-03-07T09:48:13.946Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /opt/cuda/lib64/libnvidia-ml.so*" time=2024-03-07T09:48:13.946Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/lib*/libnvidia-ml.so*" time=2024-03-07T09:48:13.946Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:287 msg="Match found: /usr/lib64/libnvidia-ml.so.1" time=2024-03-07T09:48:13.947Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:287 msg="Match found: /usr/lib64/libnvidia-ml.so.535.154.05" time=2024-03-07T09:48:13.947Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/local/lib*/libnvidia-ml.so*" time=2024-03-07T09:48:13.947Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so*" time=2024-03-07T09:48:13.947Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/lib/aarch64-linux-gnu/libnvidia-ml.so*" time=2024-03-07T09:48:13.947Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so*" time=2024-03-07T09:48:13.947Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/local/lib/python3.10/dist-packages/torch/lib/libnvidia-ml.so*" time=2024-03-07T09:48:13.947Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/local/lib/python3.10/dist-packages/torch_tensorrt/lib/libnvidia-ml.so*" time=2024-03-07T09:48:13.947Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/local/cuda/compat/lib/libnvidia-ml.so*" time=2024-03-07T09:48:13.947Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/local/nvidia/lib/libnvidia-ml.so*" time=2024-03-07T09:48:13.947Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:282 msg="Searching with pattern: /usr/local/nvidia/lib64/libnvidia-ml.so*" time=2024-03-07T09:48:13.947Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:309 msg="Discovered GPU libraries: [/usr/lib64/libnvidia-ml.so.535.154.05]" time=2024-03-07T09:48:13.947Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:314 msg="LoadCUDAMgmt started" time=2024-03-07T09:48:13.947Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:351 msg="getVerboseState started" time=2024-03-07T09:48:13.947Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:318 msg="Attempting to load CUDA library: /usr/lib64/libnvidia-ml.so.535.154.05" wiring nvidia management library functions in /usr/lib64/libnvidia-ml.so.535.154.05 dlsym: nvmlInit_v2 dlsym: nvmlShutdown dlsym: nvmlDeviceGetHandleByIndex dlsym: nvmlDeviceGetMemoryInfo dlsym: nvmlDeviceGetCount_v2 dlsym: nvmlDeviceGetCudaComputeCapability dlsym: nvmlSystemGetDriverVersion dlsym: nvmlDeviceGetName dlsym: nvmlDeviceGetSerial dlsym: nvmlDeviceGetVbiosVersion dlsym: nvmlDeviceGetBoardPartNumber dlsym: nvmlDeviceGetBrand dlsym: nvmlDeviceGetMigMode dlsym: nvmlDeviceGetMigDeviceHandleByIndex dlsym: nvmlDeviceGetDeviceHandleFromMigDeviceHandle CUDA driver version: 535.154.05 time=2024-03-07T09:48:13.955Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:326 msg="CUDA management library loaded successfully." time=2024-03-07T09:48:13.955Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:101 msg="Nvidia GPU detected" time=2024-03-07T09:48:13.955Z level=INFO source=/workspace/tewei/ollama/gpu/cpu_common.go:11 msg="CPU has AVX2" MIG Mode is 1 MIG Device Intance 0:0 found [0] CUDA device name: NVIDIA A100-SXM4-80GB MIG 3g.40gb [0] CUDA part number: 692-2G506-0210-002 [0] CUDA S/N: 1652722008440 [0] CUDA vbios version: 92.00.45.00.05 [0] CUDA brand: 14 [0] CUDA totalMem 42144366592 [0] CUDA freeMem 42105110528 time=2024-03-07T09:48:13.991Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:147 msg="CUDA GPU VRAM Total: 42144366592 bytes" time=2024-03-07T09:48:13.991Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:148 msg="CUDA GPU VRAM Free: 42105110528 bytes" time=2024-03-07T09:48:13.991Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:158 msg="CUDA Compute Capability detected: 8.0 (Minimum required: 5)" time=2024-03-07T09:48:13.991Z level=INFO source=/workspace/tewei/ollama/gpu/gpu.go:217 msg="GetGPUInfo completed with Library: cuda, Variant: " time=2024-03-07T09:48:13.991Z level=DEBUG source=/workspace/tewei/ollama/gpu/gpu.go:246 msg="cuda detected 1 devices with 36139M available memory"

Am Di., 5. März 2024 um 05:24 Uhr schrieb Jeff Waller < @.***>:

ok yea, this PR covers this:

MIG Mode is 1 MIG Device Intance 0:0 found [0] CUDA device name: NVIDIA A100-SXM4-80GB MIG 7g.80gb [0] CUDA part number: 692-2G506-0210-002 [0] CUDA S/N: 1652722008440 [0] CUDA vbios version: 92.00.45.00.05 [0] CUDA brand: 14 [0] CUDA totalMem 84987740160 [0] CUDA freeMem 83071860736

and later

source=/workspace/tewei/ollama/gpu/gpu.go:246 msg="cuda detected 1 devices with 71301M available memory"

Anything that happens after that is not something that this PR affects directly, so I don't disbelieve you are having problems, but that's another part of the ollama code.

Possibly what is going on is this PR makes it possible to reach that code now which has a bug that I am not encountering with my debugging.

For comparison, I am also using an A100, but it's 40G, and the MIG instance is only 20G in size, so possibly what is being uncovered is memory size-related. I wonder since 80G is rare, there is some part of the code that makes an assumption and allocates something too small?

Maybe we should merge this but immediately turn to your issue?

And/or can you see what happens if you define a 20G or 40G MIG instance?

— Reply to this email directly, view it on GitHub https://github.com/ollama/ollama/pull/2264#issuecomment-1977944185, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMWJSAK4B4VOAMH4LFSINZTYWVCHJAVCNFSM6AAAAABCQQ4OAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZXHE2DIMJYGU . You are receiving this because you commented.Message ID: @.***>

calimero1337 avatar Mar 07 '24 09:03 calimero1337

+1 to include that feature. We are also using A100 and starcoder2 15B 4bit. It would be a complete waste of resources to dedicate a full 40GB A100 to this.

renepeinl avatar Mar 07 '24 11:03 renepeinl

+1

Francesco-Sch avatar Mar 10 '24 09:03 Francesco-Sch

+1 Same here, currently wasting an A100 80G

dasantonym avatar Mar 13 '24 22:03 dasantonym

same issue here, got an a100, split in 3 migs (40,20,20) (for ollama allocated one with 40GB time slices)

OLLAMA_DEBUG=1 ollama serve
time=2024-03-18T11:34:54.687Z level=INFO source=images.go:806 msg="total blobs: 6"
time=2024-03-18T11:34:54.687Z level=INFO source=images.go:813 msg="total unused blobs removed: 0"
time=2024-03-18T11:34:54.688Z level=INFO source=routes.go:1110 msg="Listening on 127.0.0.1:11434 (version 0.1.29)"
time=2024-03-18T11:34:54.688Z level=INFO source=payload_common.go:112 msg="Extracting dynamic libraries to /tmp/ollama648879559/runners ..."
time=2024-03-18T11:34:59.265Z level=INFO source=payload_common.go:139 msg="Dynamic LLM libraries [rocm_v60000 cuda_v11 cpu_avx cpu_avx2 cpu]"
time=2024-03-18T11:34:59.265Z level=DEBUG source=payload_common.go:140 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-03-18T11:34:59.265Z level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-18T11:34:59.265Z level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-18T11:34:59.265Z level=DEBUG source=gpu.go:209 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /home/jovyan/course/hands_on/libnvidia-ml.so*]"
time=2024-03-18T11:34:59.266Z level=INFO source=gpu.go:237 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.147.05]"
wiring nvidia management library functions in /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.525.147.05
dlsym: nvmlInit_v2
dlsym: nvmlShutdown
dlsym: nvmlDeviceGetHandleByIndex
dlsym: nvmlDeviceGetMemoryInfo
dlsym: nvmlDeviceGetCount_v2
dlsym: nvmlDeviceGetCudaComputeCapability
dlsym: nvmlSystemGetDriverVersion
dlsym: nvmlDeviceGetName
dlsym: nvmlDeviceGetSerial
dlsym: nvmlDeviceGetVbiosVersion
dlsym: nvmlDeviceGetBoardPartNumber
dlsym: nvmlDeviceGetBrand
CUDA driver version: 525.147.05
time=2024-03-18T11:34:59.273Z level=INFO source=gpu.go:82 msg="Nvidia GPU detected"
time=2024-03-18T11:34:59.273Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-18T11:34:59.279Z level=INFO source=gpu.go:109 msg="error looking up CUDA GPU memory: device memory info lookup failure 0: 4"
time=2024-03-18T11:34:59.279Z level=INFO source=routes.go:1133 msg="no GPU detected"
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100 80G...  On   | 00000000:05:00.0 Off |                   On |
| N/A   30C    P0    43W / 300W |                  N/A |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| MIG devices:                                                                |
+------------------+----------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
|      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
|                  |                      |        ECC|                       |
|==================+======================+===========+=======================|
|  0    2   0   0  |     19MiB / 40192MiB | 42      0 |  3   0    2    0    0 |
|                  |      0MiB / 65535MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+

maltegrosse avatar Mar 18 '24 11:03 maltegrosse

@waTeim PR #3418 touches some of the relevant code from this PR so this will need a rebase after that merges.

dhiltgen avatar Apr 15 '24 22:04 dhiltgen

@waTeim seems PR #3418 is merged. would be great to see your patches rebased

maltegrosse avatar Apr 24 '24 00:04 maltegrosse

Now that we've transitioned over to leveraging the Driver API, this PR is no longer necessary, and MIG should be working.

dhiltgen avatar May 25 '24 15:05 dhiltgen