Vikash comments

Results 106 comments of


                                            Vikash

Paged Attention

Well as I mentioned before we don't actually use llama.cpp at work in our A100s, so my benchmark numbers are comparing pytorch implementations. It is possible that at this point...

The discussion here might be relevant https://github.com/ggerganov/llama.cpp/issues/1955 although it seems many people are misunderstanding how the paging works. It should be hugely beneficial for any batched inference workloads even on...

Document behavior and support around python version

@charliermarsh upon further testing I have learned that ruff does not actually support calling it as a module For eg: `python -m flake8 ` works but `python -m ruff `...

Document behavior and support around python version

Actually never mind I just realized that ruff is installed as just an `executable` and not a python module, thus there is no way to support `python -m`, still the...

Use `lemurs` as lock screen

The biggest reasons why some of the current lock screen implementations have security flaws is that if they crash then it automatically gives you access. So an attacker just do...

Flash Attention

Has there been any progress on this? Apparently there is now a flash attention 2 as well, same repo. Here is the TR https://tridao.me/publications/flash2/flash2.pdf Reports signficant increased speed over original...

Flash Attention

Also I did follow up on the Triton thread https://github.com/openai/triton/issues/153 and it seems like even though https://github.com/openai/triton/pull/1056 got closed https://github.com/openai/triton/pull/1805 did get merged. I am not sure how much more...

Vikash

Paged Attention

Paged Attention

Document behavior and support around python version

Document behavior and support around python version

Use `lemurs` as lock screen

Flash Attention

Flash Attention

Support for 4 bit Quantization

Group Query Params in struct

Why is python -m used for flake8, pylint but not mypy