cog Deploy Cog Python library to models using PEX files

PEX files are self-contained executable Python virtual environments. More specifically, they are carefully constructed zip files with a #!/usr/bin/env python and special main.py that allows you to interact with the PEX runtime. For more information about zip applications, see PEP 441.

This could be a nice solution to the slow process of pip install-ing Cog on model containers at runtime and the frustration of dealing with dependency conflicts with model code.

Jun 26 '23 22:06 mattt

The PEX website has a recipe for packaging a Uvicorn web app that seems very close to our use case: https://pex.readthedocs.io/en/v2.1.137/recipes.html

Jun 27 '23 16:06 mattt

Hi @mattt , this idea looks really intriguing. However, I have a question regarding its feasibility. It appears that the cog server loads the predictor.py module from a specified path as a module:

def load_predictor_from_ref(ref: str) -> BasePredictor:
    module_path, class_name = ref.split(":", 1)
    module_name = os.path.basename(module_path).split(".py", 1)[0]
    spec = importlib.util.spec_from_file_location(module_name, module_path)
    assert spec is not None
    module = importlib.util.module_from_spec(spec)
    assert spec.loader is not None
    spec.loader.exec_module(module)
    predictor = getattr(module, class_name)
    # It could be a class or a function
    if inspect.isclass(predictor):
        return predictor()
    return predictor

I'm wondering how the server manages to execute it within a PEX environment without including its dependencies. Could you kindly shed some light on this?

More thoughts: If cog server can execute it, then it means it imports the whole dependencies and introduces potential conflicts too :(

Jun 28 '23 05:06 hongchaodeng

@hongchaodeng Thanks for giving this some thought. I agree that we probably couldn't use PEX directly without some changes to how the HTTP server interacts with user code.

Here's a rough sketch of how this could work:

Cog compiles a model into a PEX file
During that process, Cog generates an OpenAPI file that describes the model's capabilities (predict / train) and the shapes of its inputs and outputs
Cog takes that file and generates a bespoke HTTP server to run the model, which is then compiled into its own PEX file
The HTTP server PEX app opens the model PEX app as a subprocess and communicates with it via some inter-process communication (IPC) method

Does that make sense?

Jun 28 '23 11:06 mattt

Yes. It makes sense. Thanks for taking the time to provide such a detailed explanation!

My 2c: We could use grpc for it. grpc has IPC mode when client and server sit on the same host.

Jun 28 '23 11:06 hongchaodeng

~~Totally agree about using grpc for IPC.~~Edit I've come around to thinking that HTTP would be better transport.

One advantage of splitting things out this way is that it would allow Cog models to be written in any language. For example, a Cog model written in Rust or Mojo could compile to WASM or some other self-contained executable and run just the same as a model written in Python. (And I suppose that means Cog's HTTP server could also be rewritten in a different language, too)

Jun 28 '23 11:06 mattt