Peter Pan comments

Results 91 comments of


                                            Peter Pan

CNCF TAG-Runtime Discussion/Presentation

Links below: The Slides for the intro in tag meeting: https://docs.google.com/presentation/d/1Gh-Y8t4QGrq4s2nYI5YR0rycVHjQcmEOtxB1Xqotqr8/edit?usp=sharing Recording for the meeting: https://www.youtube.com/watch?v=CrPr7TqEIrc&list=PL6wYrb-bYwC_iStEfVmBpLKDBqOy23nCA&index=44 More discussion for sandbox: https://github.com/cncf/sandbox/issues/49

torch:1.12.0+cu113 驱动:530.41.03，gpu manager调度成功之后，使用cuda报错

is this project still under maintenance ?

[Doc]: update contributing guide for macOS Apple silicon

attempt to make life easier https://github.com/vllm-project/vllm/pull/24177

Added support for Docker Build/Compose and K8S

@ShadowCrafter011 already updated his PR https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/16688 So I will refactor this one when https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/16688 merged

Added support for Docker Build/Compose and K8S

> There is already a docker image that have been maintained for over 1½ years. If there is anything in the main repo that causes these not to work then...

fix: remove dependency on latest transformers impl

> Thanks for the fix. but I am seeing another error: > > ``` > Traceback (most recent call last): > File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main > return _run_code(code,...

[Track] DeepSeek V3/R1 nextn progress

Woo, Thank you @zhyncs. just try new image `lmsysorg/sglang:v0.4.3.post2-cu125` the performance seems similar than 0.4.2 (on 16 x H20) when running-req = 1, the `gen throughput (token/s)` is no more...

feat: log input text for OpenAI format API

@merrymercy can you please kindly take a look ?

feat: log input text for OpenAI format API

> I'm confused about the CI UT failures, seems all are irrelevant ... > > * `test_video_chat_completion ` failure , > * performance threshold `test_mmlu` ` assert metrics["score"] >= 0.5`...

request for inference using multiple GPUs

I've two GPU with 24G HBM each. at the begining, code modification as below ``` inference/cli_demo.py elif generate_type == "t2v": - pipe = CogVideoXPipeline.from_pretrained(model_path, torch_dtype=dtype) + pipe = CogVideoXPipeline.from_pretrained(model_path, torch_dtype=dtype,device_map="balanced")...