torchchat
torchchat copied to clipboard
Run PyTorch LLMs locally on servers, desktop and mobile
### 🚀 The feature, motivation and pitch As you might expect given that decoding is memory-bandwidth-bound, bf16 is roughly twice as fast as fp32 on my M1 Mac: (`python torchchat.py...
### 🚀 The feature, motivation and pitch `is_torchtune_model` is a misnomer and can result in buggy code. It gates logic for models that have [`tune` suffix](https://github.com/pytorch/torchchat/blob/d0993b3508f802e81a6917b8959907a9abff827a/torchchat/generate.py#L143), but not all torchtune...
### 🚀 The feature, motivation and pitch The `torchchat` framework provides an excellent platform for embedding models into many different edge-centric platforms. The [Granite Code models](https://huggingface.co/collections/ibm-granite/granite-code-models-6624c5cec322e4c148c8b330), specifically the [3B-128k](https://huggingface.co/ibm-granite/granite-3b-code-instruct-128k) and...
Removes the need for the `-l` in the cmake call by storing the tokenizer type during export time in the PT2. Stacked on top of https://github.com/pytorch/torchchat/pull/896
### 🐛 Describe the bug Kick off a server (tested on CPU) ` python3 torchchat.py server llama3.2-11B` In a separate terminal open the browser: `streamlit run torchchat/usages/browser.py` First send a...
### 🐛 Describe the bug From a clean install using the current main branch, llama-3.2-11b-vision seems to need some love. The download of the model files from HugginFace succeded using...
### 🐛 Describe the bug When trying to run distributed/run_dist_inference.sh . It has below error. [rank0]:[rank0]: model = _load_model(builder_args) [rank0]:[rank0]: File "/scratch/grace/torchchat/torchchat/cli/builder.py", line 473, in _load_model [rank0]:[rank0]: model = _maybe_parellelize_model(model,...
### 🐛 Describe the bug chat mode on cli as well as on browser does not work on 11b model ### Versions na
### 🐛 Describe the bug the api uses generate.py. The duplicate code should be consolidated in generate.py and utility functions ### Versions N/A
### 🐛 Describe the bug When generating multiple samples from a compiled int4 model on CUDA, a runtime error occurs relating to Linear.weight swapping: ``` Traceback (most recent call last):...