torchchat icon indicating copy to clipboard operation
torchchat copied to clipboard

Run PyTorch LLMs locally on servers, desktop and mobile

Results 143 torchchat issues
Sort by recently updated
recently updated
newest added

For task specific domain adaption support for [LoRA](https://arxiv.org/abs/2106.09685) weights is needed for a variety of use cases for LLM and Diffusion models: 1. On mobile where base foundation model will...

enhancement

Maybe we build a table, with something like | Model. | Target tested | Platform tested (*) | submitter | test date | link to test transcript | |--|--|--|--|--|--| |...

enhancement

We should bring over Gemma and Mixtral support from gpt-fast. @iseeyuan can you have a look at this, and identify somebody who might drive this? Thanks! cc: @metascroy

enhancement

The graph is basically spam that can't be analyzed by the users and typically so long that even if you want to analyze it, you can't scoll back to a...

enhancement

Today we support parsing for F16, F32, Q4_0, and Q6_K GGUF tensors (see gguf_util.py). We'd like to add support for more GGUF quantization formats in https://github.com/ggerganov/llama.cpp/blob/master/ggml-quants.c. Adding support for a...

enhancement

x-ref: ET iusability issue https://github.com/pytorch/executorch/issues/2909 @byjlw @metascroy

enhancement

https://github.com/pytorch/torchchat/edit/main/docs/quantization.md Does a8w4dq also work for eager mode generate for testing before export without Executorch? cc: @digantdesai @kimishpatel

enhancement

When calling generate with a pte or dso, a gguf -path is passed to initialize the model, which is only used to get the weights. For checkpoints, this is OK...

enhancement

Implementation of the /models endpoint https://platform.openai.com/docs/api-reference/models Start the server: ``` python3 torchchat.py server stories15M ``` In another terminal: ``` curl http://127.0.0.1:5000/models {"data": [{"id": "stories15M", "created": 1722531822, "owner": "puri", "object": "model"}],...

CLA Signed

### 🐛 Describe the bug For generate on llama3.1, I got 9.1 tok/s, but chat is much slower. I got around 1.4 tok/s. Test laptop: MacBook Pro with M1 Max,...

bug
Known Gaps
actionable
MPS/Metal