Sungjae Lee issues

Results 8 issues of


                                            Sungjae Lee

Merged swagger outputs only require info of first proto file that is input

## 🐛 Bug Report Thanks for the great [help](https://github.com/grpc-ecosystem/grpc-gateway/issues/837#issuecomment-1080699455) and [guide](https://grpc-ecosystem.github.io/grpc-gateway/docs/mapping/customizing_openapi_output/#merging-output), I could merge swagger outputs of different services. By the way, the problem is that the merged output only...

bug

help wanted

openapi

good first issue

protoc-gen-openapiv2: numbering logic and 'pattern' field of path parameters do not provide sufficient information for the Swagger UI

## 🐛 Bug Report When I split a monolithic single service into multiple services and use them to generate a single swagger file, it seems that the numbering logic of...

[fix] avoid the overflow issue when supporting 32k sequence length

I found that unfused attention kernels (softmax, transpose..) can support sequence length of 32k and are largely resilient to overflow issues. However, the `addRelativeAttentionBiasUnaligned` kernel employs an integer data type...

fix: subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished in non-block mode

## issues https://github.com/ray-project/llmperf/issues/43 https://github.com/ray-project/llmperf/issues/56 ## Summary - Subsequent requests cannot be sent until whole requests have all finished even in non-block mode. - Fixing the request launcher was challenging due...

Sungjae Lee

Merged swagger outputs only require info of first proto file that is input

protoc-gen-openapiv2: numbering logic and 'pattern' field of path parameters do not provide sufficient information for the Swagger UI

[fix] avoid the overflow issue when supporting 32k sequence length

fix: subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished in non-block mode

Subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished

[Core] feat: support pinned caching with prefix caching

[RFC]: Pinned Caching with Automatic Prefix Caching (Related to Anthropic Prompt Caching API)

[Core] support LoRA and prompt adapter in content-based hashing for Block Manager v2 prefix caching