torchchat
torchchat copied to clipboard
Run PyTorch LLMs locally on servers, desktop and mobile
For task specific domain adaption support for [LoRA](https://arxiv.org/abs/2106.09685) weights is needed for a variety of use cases for LLM and Diffusion models: 1. On mobile where base foundation model will...
Maybe we build a table, with something like | Model. | Target tested | Platform tested (*) | submitter | test date | link to test transcript | |--|--|--|--|--|--| |...
We should bring over Gemma and Mixtral support from gpt-fast. @iseeyuan can you have a look at this, and identify somebody who might drive this? Thanks! cc: @metascroy
The graph is basically spam that can't be analyzed by the users and typically so long that even if you want to analyze it, you can't scoll back to a...
Today we support parsing for F16, F32, Q4_0, and Q6_K GGUF tensors (see gguf_util.py). We'd like to add support for more GGUF quantization formats in https://github.com/ggerganov/llama.cpp/blob/master/ggml-quants.c. Adding support for a...
[Feature request] Delete executorch_portable_utils.py - this should be installed by executorch setup
x-ref: ET iusability issue https://github.com/pytorch/executorch/issues/2909 @byjlw @metascroy
https://github.com/pytorch/torchchat/edit/main/docs/quantization.md Does a8w4dq also work for eager mode generate for testing before export without Executorch? cc: @digantdesai @kimishpatel
When calling generate with a pte or dso, a gguf -path is passed to initialize the model, which is only used to get the weights. For checkpoints, this is OK...
Implementation of the /models endpoint https://platform.openai.com/docs/api-reference/models Start the server: ``` python3 torchchat.py server stories15M ``` In another terminal: ``` curl http://127.0.0.1:5000/models {"data": [{"id": "stories15M", "created": 1722531822, "owner": "puri", "object": "model"}],...
### 🐛 Describe the bug For generate on llama3.1, I got 9.1 tok/s, but chat is much slower. I got around 1.4 tok/s. Test laptop: MacBook Pro with M1 Max,...