InkdyeHuang issues

Repositories
Issues
Comments

Results 4 issues of


                                            InkdyeHuang

dose it need large GPU more than 8GB?

the input size of image is fixed?

Prefix Prompt Cache

流程 prompt cache 生成阶段提供一个接口，让用户可以添加若干个prefix def add_prefix_template({prefix_name, prefix_content}) 让vllm 生成对应prefix_content的kv cache，并且进行存储 request阶段 def add_request增加一个参数prefix_name 如果配置了，那么代表使用这个prefix_content的如果传入没有生成过的prefix_name，报错（后续再根据使用情况调整）实现 prompt cache生成阶段 add_prefix_template 存放在llm_engine里 worker里面搞一个execute_prefix来生成prefix的kv cache，并且存储 model计算中生成离散kv 计算后gather一下，生成连续一个全局的dict，里面一个prefix_name对应一个seq_group，seq_group额外包含 kv cache的离散显存, 如果prefix_token_id %...

InkdyeHuang

dose it need large GPU more than 8GB?

the input size of image is fixed?

Prefix Prompt Cache

which is faster between smoothquant and autogptq?