nos issues

Reduce bloat, move init, ready, id etc into the subclass. Right now we just have an inference runtime, but future releases might include runtimes for benchmarking, compilation etc.

outtanames

Investigate model weights diffs for faster `hub.load(...)`

If we're able to build checksums for layer-wise weights, we should be able to only download the diffs and speed up model downloads significantly. This is particularly helpful if you're...

spillai

optimizations

Interrupting grpc server during model download results in corrupted pth file. Maybe add a checksum as well?

outtanames

MacOS support (with tests for M1 Mac, Intel Mac)

spillai

distribution

Docker image optimization with multi-stage builds, and cuda base

spillai

Investigate `accelerate` dependencies with several `nvidia-*` packages

Currently we create a large docker image (11GB) for the base gpu image

spillai

optimizations

[ci] Platform support (Windows, MacOS Intel, MacOS M1)

- Github workflow CI: Support platforms: Windows, MacOS for basic models (SD v2, CLIP)

spillai

ci

`nos serve`: Serve optimized `nos` model by name

- `nos serve -m stability-ai/stable-diffusion-v2`: Serve optimized `nos` model (blocking) - `nos serve -d stability-ai/stable-diffusion-v2`: Serve optimized `nos model (daemon/detached) - `nos serve -c deployment.yml`: Serve collection of models (blocking)...

spillai

`hub.register` decorator with a full model deployment spec (compile, run)

Register models as part of the nos hub registry, with full build-time and runtime spec. ```python @hub.register( name="/detection2d-detr-resnet-50", build_spec=DevelopmentConfig( conda="autonomi-ai/nos-base-dev", resources=ResourceConfig(cpu=8, memory="8Gi", gpu=0.25, gpu_memory="4Gi"), # runtime resource ), runtime_spec=RuntimeConfig( conda="autonomi-ai/nos-base-runtime",...

spillai

feature

nos
nos copied to clipboard

Metadata

Complete test coverage for docker runtime and CLI.

Subclass InferenceServiceRuntime under DockerRuntime

Investigate model weights diffs for faster `hub.load(...)`

Interrupting grpc server during model download results in corrupted pth file. Maybe add a checksum as well?

MacOS support (with tests for M1 Mac, Intel Mac)

Docker image optimization with multi-stage builds, and cuda base

Investigate `accelerate` dependencies with several `nvidia-*` packages

[ci] Platform support (Windows, MacOS Intel, MacOS M1)

`nos serve`: Serve optimized `nos` model by name

`hub.register` decorator with a full model deployment spec (compile, run)

← Metadata

Owner

Metadata

nos nos copied to clipboard

Metadata

← Metadata

Owner

Metadata

nos
nos copied to clipboard