cool9203

Results 10 comments of cool9203

提供參考資料:https://openhome.cc/Gossip/CppGossip/

ok, this bug is sloved. use environment and version: - OS : ubuntu 20.04.1 - k8s version : v1.23.1 - docker-client version : 19.03.13 - docker-server version : 20.10.12 -...

@pokerfaceSad Happy Spring Festival!! thanks for your reply. - https://github.com/pokerfaceSad/GPUMounter/blob/7036133177eabe2e32e03b33392df17dd8945dd1/pkg/util/cgroup/cgroup.go#L33-L40 you right, today i test done, this is can run. this is not necessary edit. this edit is my test...

i got another problem. - https://github.com/pokerfaceSad/GPUMounter/blob/7036133177eabe2e32e03b33392df17dd8945dd1/pkg/server/gpu-mount/server.go#L124-L135 in call RemoveGPU, some times get error `Invalid UUIDs`. i track this error, found this is slave pod status is terminating, than pod will...

https://github.com/pokerfaceSad/GPUMounter/issues/19#issuecomment-1033637663 maybe i solved this. https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/ seem like from k8s v1.20+, owner pod and slave pod be need in same namespace. if owner pod and slave pod not in same...

@pokerfaceSad sorry, i reply late. > @cool9203 The bug of constant `cgroup driver` has been fixed in [163ef7b](https://github.com/pokerfaceSad/GPUMounter/commit/163ef7b10e7b53180033d1585c9e637c72b3b105). `cgroup driver` can be set in [/deploy/gpu-mounter-workers.yaml](https://github.com/pokerfaceSad/GPUMounter/blob/163ef7b10e7b53180033d1585c9e637c72b3b105/deploy/gpu-mounter-workers.yaml) by environment variable `CGROUP_DRIVER`. thanks...

Can reference this [solution](https://stackoverflow.com/a/34033230) This is my test code, is work. Won't get `KeyError` when format ollama modelfile data. ```python # coding: utf-8 from __future__ import print_function import string llama31_ollama...

I have similar problems, but not be same, just get same error message Used vgpu vm with a6000 vram 24G(vram full is 48G) full script: https://github.com/cool9203/unsloth-train/blob/d1c1ab702707ae5bdf69c0d303006c5726a61b23/unsloth_train/train_vision.py ```bash 🦥 Unsloth: Will...

I feeling not torch problem, maybe is vgpu or newest cuda driver or flash attention problem? but this is my guess I can run my script in another computer, `windows...

Hello, my coworker @treeaaa test it, this error from https://github.com/unslothai/unsloth/blob/bc5f726a3cba3dbacda604a288dbc352c0baa737/unsloth/__init__.py#L58 Delete this line will be work. We guess is low level error, vgpu or cuda not support this, so got...