flynn

Results 11 comments of flynn

the DRA cannot be used in production environments. Can we modify the code of GPU allocation strategy to implement it? Can anyone give me some help?

> the DRA cannot be used in production environments. Can we modify the code of GPU allocation strategy to implement it? Can anyone give me some help? example: used timeslicing,replicas...

> 是两个文件,具体是什么问题呢? 怎么进一步转换成 onnx 或者 pmx 格式?用 ppl.llm.serving 启动,提升 pmx 或者 onnx 文件不存在

> > 怎么进一步转换成 onnx 或者 pmx 格式?用 ppl.llm.serving 启动,提升 pmx 或者 onnx 文件不存在 > > 继续Export.py导出模型,就能获得onnx格式的文件 试过了,继续 Export 导出模型,有大量的警告, Warning: The shape interface of opmx::XX(如 ParallelEmbedding、ColumnParallelLinear、Reshape等) type is missing,用转出来的 onnx...

version 0.6.4.post1 has the same issue,setting --enable-chunked-prefill=False did not have any effect.

> version 0.6.4.post1 has the same issue,setting --enable-chunked-prefill=False did not have any effect. after removing the parameter '--enable-prefix-caching', this issue no longer occurs

https://github.com/opendatalab/MinerU/pull/3967 这个 pr 优化了内存,谁审核合并下? @myhloli @skyler9901

> [#3967](https://github.com/opendatalab/MinerU/pull/3967) > > 这个 pr 优化了内存,谁审核合并下? > > [@myhloli](https://github.com/myhloli) [@skyler9901](https://github.com/skyler9901) 持续压测一个 80m(286页)的pdf,优化前内存先到20g,后面还会涨,最终oom,优化后跑了两天,内存不超过2g

> I'm having the same problem with Qwen2.5-32B-Instruct-GPTQ-Int4,--quantization gptq try --quantization gptq_marlin,there is no error.