jon-chuang
jon-chuang
As a wild idea, I would like to suggest the possibility of a privacy-preserving version of delegated DPC. Although most would be willing to trust a cloud server for DPC,...
Plookups provide efficient bit decomposition. For instance, one can use lookup tables of size 2^13. Furthermore, one can use RNS decomposition (using the modulus F and a power of 2...
lcm: 18.5ms -> 25.0ms. Same story with v1.0.0+torch2.1.1+cu121+xformers0.23 nightly release is even worse: (30ms) When I use v.1.0.0 with torch 2.1.2 and xformers0.23.post1, I do not observe this issue. So...
``` 100%|██████████| 50/50 [00:00
enable vae.encode compile: A100 speedup - original VAE: 1 step LCM sdv1.5 - 24ms speedup! (0.062s -> 0.038s) (I think it doesn't actually work though) - taesd: (4step LCM sdv1.5)...
@tocean @wkcn In line with the investigation in https://github.com/NVIDIA/TransformerEngine/issues/424, it would be great to get the insights from the team at microsoft for using FP8 in aspects of training besides...
### Description On various platforms, versions and backends `jax.profiler.trace` emits a trace that is truncated. ### System info (python version, jaxlib version, accelerator, etc.) Here is one such example ```...
Easier to debug and diagnose issues with accuracy For example, while correctness testing, small error may mean one just needs to change rtol/atol due to lower precision / quantization. Right...
Fix the FP8 Triton kernel issue. Should enable FP8 KV Cache to be used with: 1. chunked prefill 2. prefix caching FIX https://github.com/vllm-project/vllm/issues/4381 https://github.com/vllm-project/vllm/issues/3880 https://github.com/vllm-project/vllm/issues/3156 https://github.com/vllm-project/vllm/issues/3880 TODO: - [x] Undo...
### Your current environment As of merging https://github.com/vllm-project/vllm/pull/7208 ### 🐛 Describe the bug Illegal memory access for facebook/opt-125m Specifically one of these errors: ``` RuntimeError: Triton Error [CUDA]: an illegal...