Funtowicz Morgan
Funtowicz Morgan
Hey @alexeyr 👋🏻, thanks so much for bringing this C++ wrapper. I'm especially interested in this for some projects we have at huggingface and wanted to leverage what you id...
@ChongyuNVIDIA sorry for the delay, please find below an example repro: Please note you need transformers main branch in order to have FX support for BLOOM. ```python from apex.contrib.sparsity import...
@JingyaHuang is this PR still needed? I can have a deeper look to merge and close
ORT folks contributed an example for this on the transformers repository if it can be useful: https://github.com/huggingface/transformers/tree/main/examples/research_projects/onnx/summarization
@mandubian @n1t0 We might expose special tokens vocab on the Tokenizer class so that it's easy for the user to get access to such information. The concern is, different tokenizers...
For windows I can help on having a `winget` install path https://github.com/microsoft/winget-cli
Tentatively yes - this or `HuggingFace.Cli` I think there is a short and long reference(s)
@Narsil yes, the original PR #2791 has significant file movements to simplify the overall structure for TRTLLM. This specific PR depends on the #2791 and changes are pretty much concentrated...
Hi @geraldstanje I dont think this is the right place aha, would you mind opening this issue in [huggingface/optimum](https://github.com/huggingface/optimum/issues) repository? Closing this one here as it's not related to the...
Example usage: ```python from optimum.tools.records import AutoPerformanceTracker, PerformanceRecord if __name__ == '__main__': tracker = AutoPerformanceTracker.from_uri( "es+aws://benchmarks-kb3me[..]q7deny.us-east-1.es.amazonaws.com" ) record = PerformanceRecord.latency( metric="TIME_TO_FIRST_TOKEN", value=123.4, meta={ "commit_id": "saflsfkja3115", "model_id": "meta-llama/Meta-Llama-3.1-8B-Instruct", "dtype": "float16", "tgi_version":...