torchchat issues

[Feature request] Add support for LoRA adapter weights

1

For task specific domain adaption support for [LoRA](https://arxiv.org/abs/2106.09685) weights is needed for a variety of use cases for LLM and Diffusion models: 1. On mobile where base foundation model will...

chauhang

enhancement

[Feature request] Need a format for test reports and how we might track them?

Maybe we build a table, with something like | Model. | Target tested | Platform tested (*) | submitter | test date | link to test transcript | |--|--|--|--|--|--| |...

mikekgfb

enhancement

[Feature request] Add Gemma. Mixtral support from gpt-fast

1

We should bring over Gemma and Mixtral support from gpt-fast. @iseeyuan can you have a look at this, and identify somebody who might drive this? Thanks! cc: @metascroy

mikekgfb

enhancement

[Feature request] Stop Executorch dump of graph, unless there some debug flag

2

The graph is basically spam that can't be analyzed by the users and typically so long that even if you want to analyze it, you can't scoll back to a...

mikekgfb

enhancement

[Feature request] Support more GGUF tensor formats

Today we support parsing for F16, F32, Q4_0, and Q6_K GGUF tensors (see gguf_util.py). We'd like to add support for more GGUF quantization formats in https://github.com/ggerganov/llama.cpp/blob/master/ggml-quants.c. Adding support for a...

metascroy

enhancement

[Feature request] Delete executorch_portable_utils.py - this should be installed by executorch setup

x-ref: ET iusability issue https://github.com/pytorch/executorch/issues/2909 @byjlw @metascroy

mikekgfb

enhancement

[Feature request] Eager mode support for a8w4dq

4

https://github.com/pytorch/torchchat/edit/main/docs/quantization.md Does a8w4dq also work for eager mode generate for testing before export without Executorch? cc: @digantdesai @kimishpatel

mikekgfb

enhancement

[Feature request] Make GGUF load lazy

2

When calling generate with a pte or dso, a gguf -path is passed to initialize the model, which is only used to get the weights. For checkpoints, this is OK...

metascroy

enhancement

Openai api models endpoint

1

Implementation of the /models endpoint https://platform.openai.com/docs/api-reference/models Start the server: ``` python3 torchchat.py server stories15M ``` In another terminal: ``` curl http://127.0.0.1:5000/models {"data": [{"id": "stories15M", "created": 1722531822, "owner": "puri", "object": "model"}],...

vmpuri

CLA Signed

MPS: chat command is much slower than generate on Mac

9

### 🐛 Describe the bug For generate on llama3.1, I got 9.1 tok/s, but chat is much slower. I got around 1.4 tok/s. Test laptop: MacBook Pro with M1 Max,...

iseeyuan

bug

Known Gaps

actionable

MPS/Metal

torchchat
torchchat copied to clipboard

Metadata

[Feature request] Add support for LoRA adapter weights

[Feature request] Need a format for test reports and how we might track them?

[Feature request] Add Gemma. Mixtral support from gpt-fast

[Feature request] Stop Executorch dump of graph, unless there some debug flag

[Feature request] Support more GGUF tensor formats

[Feature request] Delete executorch_portable_utils.py - this should be installed by executorch setup

[Feature request] Eager mode support for a8w4dq

[Feature request] Make GGUF load lazy

Openai api models endpoint

MPS: chat command is much slower than generate on Mac

← Metadata

Owner

Metadata

torchchat torchchat copied to clipboard

Metadata

← Metadata

Owner

Metadata

torchchat
torchchat copied to clipboard