Vikash

Results 106 comments of Vikash

Well as I mentioned before we don't actually use llama.cpp at work in our A100s, so my benchmark numbers are comparing pytorch implementations. It is possible that at this point...

The discussion here might be relevant https://github.com/ggerganov/llama.cpp/issues/1955 although it seems many people are misunderstanding how the paging works. It should be hugely beneficial for any batched inference workloads even on...

@charliermarsh upon further testing I have learned that ruff does not actually support calling it as a module For eg: `python -m flake8 ` works but `python -m ruff `...

Actually never mind I just realized that ruff is installed as just an `executable` and not a python module, thus there is no way to support `python -m`, still the...

The biggest reasons why some of the current lock screen implementations have security flaws is that if they crash then it automatically gives you access. So an attacker just do...

Has there been any progress on this? Apparently there is now a flash attention 2 as well, same repo. Here is the TR https://tridao.me/publications/flash2/flash2.pdf Reports signficant increased speed over original...

Also I did follow up on the Triton thread https://github.com/openai/triton/issues/153 and it seems like even though https://github.com/openai/triton/pull/1056 got closed https://github.com/openai/triton/pull/1805 did get merged. I am not sure how much more...

Yep, I was also looking into this, this would be very nice to have, but I am not sure if we can make it even remotely approach the efficiency of...

This would be a very useful feature, FastAPI supports this as dependencies https://fastapi.tiangolo.com/tutorial/dependencies/classes-as-dependencies/?h=.

Hi @cpitclaudel thanks for the clarification I have a better understanding of how that works now. So currently the issue is ruff does not seem to work when `python -m...