Scott Roy issues

Results 37 issues of


                                            Scott Roy

Replace view-like ops with view ops

Summary: We have optimizations to remove view_copy operations, so we covert view_copy-like operations (squeeze_copy, unsqueeze_copy, select_copy) to view_copy operations where possible. Differential Revision: D54866539

CLA Signed

fb-exported

No op CI test

Summary: Test if CI passes with no change Differential Revision: D55995628

CLA Signed

fb-exported

Improve ET perf on M1 mac

Summary: Improve perf of ET on M1 by labeling cpuinfo_uarch_icestorm as a non-performant core. With this change, we use 6 cores on M1 instead of 10. Differential Revision: D57416160

CLA Signed

fb-exported

[FEATURE REQUEST] Auto-detect llama2/llama3 from tokenizer.model in runner-aoti/runner-et

Remove `-l 2` and `-l 3` flags and auto-detect model architecture from tokenizer class. Issue warning if user supplied -v does not match the tokenizer.model inferred vocab size.

Issue with blobfile installing leading to non-deterministic failures on CI

Blobfile not installing correctly with pip. ``` (cchat) scroy@scroy-mbp torchchat % which python /opt/miniconda3/envs/cchat/bin/python (cchat) scroy@scroy-mbp torchchat % pip install blobfile Requirement already satisfied: blobfile in /opt/miniconda3/envs/cchat/lib/python3.10/site-packages (2.1.1) Requirement already...

[FEATURE REQUEST] natively parse 4-bit embedding quantized tensors from GGUF Q4_0 files

In https://github.com/pytorch/torchchat/blob/main/build/gguf_loader.py, we directly convert Q4_0 quantized linear weights to _convert_weight_to_int4pack (our native 4-bit quantization in pytorch). All other tensors are converted to float. We should be able to directly...

Scott Roy

Replace view-like ops with view ops

No op CI test

Improve ET perf on M1 mac

[FEATURE REQUEST] Auto-detect llama2/llama3 from tokenizer.model in runner-aoti/runner-et

Issue with blobfile installing leading to non-deterministic failures on CI

[FEATURE REQUEST] natively parse 4-bit embedding quantized tensors from GGUF Q4_0 files

[Feature request] Support more GGUF tensor formats

[Feature request] Make GGUF load lazy

Lowbit

[draft] add kleidiai