Results 4 issues of charger

### Describe your question I want to create some public api services in determined, but I don't know how to do. 1. how to map the port to the host...

feature
question

**前言** 感谢开发者,研发出了如此易理解、好部署、配件完善的加速库🎉🎉🎉,真的很棒,对我很有帮助😊😊😊!!! **问题描述** 有大量业务场景,仅需要模型生成单个token,如:新闻分类、逻辑推断、情感分析、关系提取、语种检测...。 在此类场景下,fastllm库中的llama模型实现(其它模型可能也存在)存在一个严重问题:随着batch size增大,耗时线性增长😰。 这个问题其他用户也复现了,见issue:[ISSUE 337](https://github.com/ztxz16/fastllm/issues/337) **复现细节** - 硬件:显卡4090,内存cpu管够。 - 模型:LlamaModel - 接口:batch_response ``` import pyfastllm model_path = "tokenizer path" tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code = True) flm_model =...

An exceptional project! 🎉 Official GPTs from ChatGPT can be incorporated. Is it possible to categorize and divide into multiple Markdown files? As a large number of GPTs will be...