cog
cog copied to clipboard
Building cog without GPU fails
One of the reasons I want to use Replicate is that I don't have an nvidia GPU. But for some reason I'm unable to push a model without actually having an nvidia GPU, since the "validating" step tries to run the generated docker file at least once, I assume?
Can't I just skip this step?
The error message:
Validating model schema...
Traceback (most recent call last):
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/llama_cpp/llama_cpp.py", line 74, in _load_shared_library
return ctypes.CDLL(str(_lib_path), **cdll_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.7/lib/python3.11/ctypes/__init__.py", line 376, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libcuda.so.1: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/cog/command/openapi_schema.py", line 18, in <module>
app = create_app(config, shutdown_event=None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/cog/server/http.py", line 72, in create_app
predictor = load_predictor_from_ref(predictor_ref)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/cog/predictor.py", line 172, in load_predictor_from_ref
spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/src/predict.py", line 2, in <module>
from llama_cpp import Llama
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/llama_cpp/__init__.py", line 1, in <module>
from .llama_cpp import *
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/llama_cpp/llama_cpp.py", line 87, in <module>
_lib = _load_shared_library(_lib_base_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/llama_cpp/llama_cpp.py", line 76, in _load_shared_library
raise RuntimeError(f"Failed to load shared library '{_lib_path}': {e}")
RuntimeError: Failed to load shared library '/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/llama_cpp/libllama.so': libcuda.so.1: cannot open shared object file: No such file or directory
ⅹ Failed to get type signature: exit status 1
+1 on this. I've been trying to run on my M1 mac and it just doesn't want to play.
@skerit move the import to setup step to circumvent the issue. Hacky but at least you'll be able to deploy the llama_cpp python pipeline. For example -
class Predictor(BasePredictor): def setup(self): from llama_cpp import LlamaGrammar, Llama self.LlamaGrammar = LlamaGrammar model_path = "/models/MODEL_NAME" self.llm = Llama( model_path, n_ctx=2048, n_gpu_layers=-1, main_gpu=0, n_threads=1 )
Same problem on my Ubuntu 22. No GPU and want to deploy without a model validation