q yao issues

Results 34 issues of


                                            q yao

Question about ARF

Hi I am a little bit confused with ARF cuda kernel. https://github.com/ZhouYanzhao/ORN/blob/d6b38aa5e5c3ca7c6e3d0ed5770e581ee1daadcd/src/orn/lib/active_rotating_filters.cu#L19-L33 Let's say, assume thread 0 and thead 1 has: i0 == i1 j0 == j1 k0 == k1...

fix box iou rotated

https://github.com/open-mmlab/mmcv/issues/2933#issuecomment-1758931803

Optimize kernel launch for triton2.2.0 and triton2.3.0

- triton 2.1.0 has best performance - parse signature (in 2.2.0 and 2.3.0) cost a lot. - 2.3.0 does not accept device and stream.

improvement

PyTorch Engine hash table based prefix caching

Implementation of https://github.com/InternLM/lmdeploy/issues/1407#issuecomment-2044203407 I plan to refactor the implementation of s-lora so we do not need to change block size when enabling adapters. @zhyncs @ispobock

enhancement

Optimize moe

similar optimization https://github.com/InternLM/lmdeploy/pull/1515 for deepseek-moe, qwen2-moe, dbrx

improvement

support phi3

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand...

Optimize mixtral

```bash python3 \ benchmark/profile_throughput.py \ ShareGPT_V3_unfiltered_cleaned_split.json \ Mixtral-8x22B-v0.1 \ --backend pytorch \ --cache-max-entry-count 0.65 \ --num-prompts 3000 \ --concurrency 256 \ --tp 4 ``` ``` -------------------------------------------------- concurrency: 256 elapsed_time: 736.060s...

improvement

q yao

Question about ARF

fix box iou rotated

Optimize kernel launch for triton2.2.0 and triton2.3.0

PyTorch Engine hash table based prefix caching

Optimize moe

support phi3

Optimize mixtral

Optimize slora

Torch engine prefix caching

Optimize w8a8 kernel