Funtowicz Morgan issues

Results 9 issues of


                                            Funtowicz Morgan

Fix an issue where `child_name` can be None and make the overall ASP fail

When attempting to sparsify a transformers model, it appears for some reason `child_name` can be `None` and thus `fx_graph.get(None)` returns `None` and make the overall process crash. This PR attempts...

Enable testing TGI on XPU

Adds TensorRT-LLM backend to TGI

This PR aims at adding a new custom backends to TGI, namely Nvidia TensorRT-LLM. The underlying implementation is done through the use of a Rust C++ automatically generated binding living...

Add infra to push performance metric to remote performance tracker

This PR introduces a new subpackage `optimum.tools.records` which aims at providing the bare-minimum infrastructure required to push performance metrics to our internal tracking system - [x] Pythonic API RFC -...

[TENSORRT-LLM] - Implement new looper thread based backend

Current backend implementation relies on locking mecanism to access, within each tokio's requests context thread, the executor on the C++ side. This locking results in a heavy contention for all...

Add missing headers for mpiUtils.h to compile with gcc13

This PR attempts to fix building issue on GCC13 which is now shipped in all nvidia/cuda container images based on ubuntu-24.04. GCC13 now needs to include some additional headers compared...

TensorRT-LLM backend bump to latest version + misc fixes

This PR bumps some dependencies related to TensorRT-LLM alongside rebasing Docker container against ubuntu24.04 instead of ubuntu22.04. To support this, we need to use latest TensorRT-LLM main due to a...

Exposes TensorRT-LLM finish reason to the server

Add llama.cpp backend

This PR is an initial implementation of llama.cpp as potential backend for TGI. It mostly targets CPU inference in a single/multi stream scheduling fashion, potentially spawning multiple instances of the...