jon-chuang issues

Results 134 issues of


                                            jon-chuang

Homomorphic Encryption of delegated DPC computations (based on enclaves)

As a wild idea, I would like to suggest the possibility of a privacy-preserving version of delegated DPC. Although most would be willing to trust a cloud server for DPC,...

Potential upgrade via plookups

Plookups provide efficient bit decomposition. For instance, one can use lookup tables of size 2^13. Furthermore, one can use RNS decomposition (using the modulus F and a power of 2...

Perf regression on A100 in v1.0.0+torch212+cu121+xformers0.23.post1 v.s. 0.0.13+torch2.0.0+cu121+xformers0.22patch7

lcm: 18.5ms -> 25.0ms. Same story with v1.0.0+torch2.1.1+cu121+xformers0.23 nightly release is even worse: (30ms) When I use v.1.0.0 with torch 2.1.2 and xformers0.23.post1, I do not observe this issue. So...

Update torch.compile benchmark on A100 40GB SDv1.5 for torch nightly

``` 100%|██████████| 50/50 [00:00

componentwise config

enable vae.encode compile: A100 speedup - original VAE: 1 step LCM sdv1.5 - 24ms speedup! (0.062s -> 0.038s) (I think it doesn't actually work though) - taesd: (4step LCM sdv1.5)...

Questions: Clarifying the use of FP8 for Training

@tocean @wkcn In line with the investigation in https://github.com/NVIDIA/TransformerEngine/issues/424, it would be great to get the insights from the team at microsoft for using FP8 in aspects of training besides...

`jax.profiler.trace` repeatedly fails to display entire trace

### Description On various platforms, versions and backends `jax.profiler.trace` emits a trace that is truncated. ### System info (python version, jaxlib version, accelerator, etc.) Here is one such example ```...

bug

[Misc/Testing] Use `torch.testing.assert_close`

Easier to debug and diagnose issues with accuracy For example, while correctness testing, small error may mean one just needs to change rtol/atol due to lower precision / quantization. Right...

ready

[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel

Fix the FP8 Triton kernel issue. Should enable FP8 KV Cache to be used with: 1. chunked prefill 2. prefix caching FIX https://github.com/vllm-project/vllm/issues/4381 https://github.com/vllm-project/vllm/issues/3880 https://github.com/vllm-project/vllm/issues/3156 https://github.com/vllm-project/vllm/issues/3880 TODO: - [x] Undo...

ready

[Bug]: prefill/prefix FP8 triton kernel for opt-125m - an illegal memory access was encountered

### Your current environment As of merging https://github.com/vllm-project/vllm/pull/7208 ### 🐛 Describe the bug Illegal memory access for facebook/opt-125m Specifically one of these errors: ``` RuntimeError: Triton Error [CUDA]: an illegal...

bug