torchchat icon indicating copy to clipboard operation
torchchat copied to clipboard

Run PyTorch LLMs locally on servers, desktop and mobile

Results 143 torchchat issues
Sort by recently updated
recently updated
newest added

### 🚀 The feature, motivation and pitch As you might expect given that decoding is memory-bandwidth-bound, bf16 is roughly twice as fast as fp32 on my M1 Mac: (`python torchchat.py...

enhancement
performance
actionable

### 🚀 The feature, motivation and pitch `is_torchtune_model` is a misnomer and can result in buggy code. It gates logic for models that have [`tune` suffix](https://github.com/pytorch/torchchat/blob/d0993b3508f802e81a6917b8959907a9abff827a/torchchat/generate.py#L143), but not all torchtune...

### 🚀 The feature, motivation and pitch The `torchchat` framework provides an excellent platform for embedding models into many different edge-centric platforms. The [Granite Code models](https://huggingface.co/collections/ibm-granite/granite-code-models-6624c5cec322e4c148c8b330), specifically the [3B-128k](https://huggingface.co/ibm-granite/granite-3b-code-instruct-128k) and...

Removes the need for the `-l` in the cmake call by storing the tokenizer type during export time in the PT2. Stacked on top of https://github.com/pytorch/torchchat/pull/896

CLA Signed

### 🐛 Describe the bug Kick off a server (tested on CPU) ` python3 torchchat.py server llama3.2-11B` In a separate terminal open the browser: `streamlit run torchchat/usages/browser.py` First send a...

bug
Browser
Llama 3.2- Multimodal

### 🐛 Describe the bug From a clean install using the current main branch, llama-3.2-11b-vision seems to need some love. The download of the model files from HugginFace succeded using...

bug
Llama 3.2- Multimodal

### 🐛 Describe the bug When trying to run distributed/run_dist_inference.sh . It has below error. [rank0]:[rank0]: model = _load_model(builder_args) [rank0]:[rank0]: File "/scratch/grace/torchchat/torchchat/cli/builder.py", line 473, in _load_model [rank0]:[rank0]: model = _maybe_parellelize_model(model,...

Distributed

### 🐛 Describe the bug chat mode on cli as well as on browser does not work on 11b model ### Versions na

Known Gaps
Llama 3.2- Multimodal

### 🐛 Describe the bug the api uses generate.py. The duplicate code should be consolidated in generate.py and utility functions ### Versions N/A

### 🐛 Describe the bug When generating multiple samples from a compiled int4 model on CUDA, a runtime error occurs relating to Linear.weight swapping: ``` Traceback (most recent call last):...

bug
Compile / AOTI
Quantization