BentoML
BentoML copied to clipboard
bug: openllm build creates 3 copies of the model weights in different places
Describe the bug
When running openllm build
with BENTOML_HOME=/foobar
(for example):
- First, the model weights are downloaded to a directory under
$HOME
(in my case, under/root
because this is running in a Docker container in a Kubernetes pod). - Second, the weights are copied to a directory under
/tmp
- Finally, the weights are copied again to a directory under
BENTOML_HOME
(which is where we wanted them)
I'm guessing at least one of these copies is unnecessary. Ideally, the files would end up under BENTOML_HOME
directly without any intermediate copies, but I'm not sure if that's feasible.
In any case, it would be helpful to document that the build process requires enough storage for the full model at all three locations. When building inside a Kubernetes pod, for example, one must mount volumes at both /root
and /tmp
that are big enough to hold the model, else there will be an error saying the pod has exhausted its ephemeral-storage
.
To reproduce
Example Python code:
import os
import subprocess
cmd = ["openllm", "build", "falcon", "--model-id", "tiiuae/falcon-7b"]
env = os.environ.copy()
env.update({"BENTOML_HOME": "/somewhere"})
subprocess.run(cmd, env=env, check=True)
I monitored disk usage with a background process that ran the following shell command every second:
for d in /*; do du -sh $d; done
Logs
No response
Environment
bentoml: 1.1.6
System information (Optional)
Running inside Docker container in Kubernetes pod
This is intended, as we want the build from bentoml to be atomic. I will probably transfer this to BentoML and we can track it there.