Deploy Cog Python library to models using PEX files
PEX files are self-contained executable Python virtual environments. More specifically, they are carefully constructed zip files with a #!/usr/bin/env python and special main.py that allows you to interact with the PEX runtime. For more information about zip applications, see PEP 441.
This could be a nice solution to the slow process of pip install-ing Cog on model containers at runtime and the frustration of dealing with dependency conflicts with model code.
The PEX website has a recipe for packaging a Uvicorn web app that seems very close to our use case: https://pex.readthedocs.io/en/v2.1.137/recipes.html
Hi @mattt , this idea looks really intriguing. However, I have a question regarding its feasibility. It appears that the cog server loads the predictor.py module from a specified path as a module:
def load_predictor_from_ref(ref: str) -> BasePredictor:
module_path, class_name = ref.split(":", 1)
module_name = os.path.basename(module_path).split(".py", 1)[0]
spec = importlib.util.spec_from_file_location(module_name, module_path)
assert spec is not None
module = importlib.util.module_from_spec(spec)
assert spec.loader is not None
spec.loader.exec_module(module)
predictor = getattr(module, class_name)
# It could be a class or a function
if inspect.isclass(predictor):
return predictor()
return predictor
I'm wondering how the server manages to execute it within a PEX environment without including its dependencies. Could you kindly shed some light on this?
More thoughts: If cog server can execute it, then it means it imports the whole dependencies and introduces potential conflicts too :(
@hongchaodeng Thanks for giving this some thought. I agree that we probably couldn't use PEX directly without some changes to how the HTTP server interacts with user code.
Here's a rough sketch of how this could work:
- Cog compiles a model into a PEX file
- During that process, Cog generates an OpenAPI file that describes the model's capabilities (predict / train) and the shapes of its inputs and outputs
- Cog takes that file and generates a bespoke HTTP server to run the model, which is then compiled into its own PEX file
- The HTTP server PEX app opens the model PEX app as a subprocess and communicates with it via some inter-process communication (IPC) method
Does that make sense?
Yes. It makes sense. Thanks for taking the time to provide such a detailed explanation!
My 2c: We could use grpc for it. grpc has IPC mode when client and server sit on the same host.
Totally agree about using grpc for IPC.Edit I've come around to thinking that HTTP would be better transport.
One advantage of splitting things out this way is that it would allow Cog models to be written in any language. For example, a Cog model written in Rust or Mojo could compile to WASM or some other self-contained executable and run just the same as a model written in Python. (And I suppose that means Cog's HTTP server could also be rewritten in a different language, too)