Scott Roy
Scott Roy
Summary: We have optimizations to remove view_copy operations, so we covert view_copy-like operations (squeeze_copy, unsqueeze_copy, select_copy) to view_copy operations where possible. Differential Revision: D54866539
Summary: Test if CI passes with no change Differential Revision: D55995628
Summary: Improve perf of ET on M1 by labeling cpuinfo_uarch_icestorm as a non-performant core. With this change, we use 6 cores on M1 instead of 10. Differential Revision: D57416160
Remove `-l 2` and `-l 3` flags and auto-detect model architecture from tokenizer class. Issue warning if user supplied -v does not match the tokenizer.model inferred vocab size.
Blobfile not installing correctly with pip. ``` (cchat) scroy@scroy-mbp torchchat % which python /opt/miniconda3/envs/cchat/bin/python (cchat) scroy@scroy-mbp torchchat % pip install blobfile Requirement already satisfied: blobfile in /opt/miniconda3/envs/cchat/lib/python3.10/site-packages (2.1.1) Requirement already...
In https://github.com/pytorch/torchchat/blob/main/build/gguf_loader.py, we directly convert Q4_0 quantized linear weights to _convert_weight_to_int4pack (our native 4-bit quantization in pytorch). All other tensors are converted to float. We should be able to directly...
Today we support parsing for F16, F32, Q4_0, and Q6_K GGUF tensors (see gguf_util.py). We'd like to add support for more GGUF quantization formats in https://github.com/ggerganov/llama.cpp/blob/master/ggml-quants.c. Adding support for a...
When calling generate with a pte or dso, a gguf -path is passed to initialize the model, which is only used to get the weights. For checkpoints, this is OK...