Huiqiang Jiang

Results 154 comments of Huiqiang Jiang

Hi @LYH-YF, the GSM8K experiment is based on the **GPT-3.5-Turbo-0301 completion** model. Due to recent changes in OpenAI's API, the 3.5-turbo-0301 completion mode is no longer available, but it can...

Hi @kofuya, Thanks for your support in our project. Could you give me more context, like the original prompt.

Hi @deltawi, thank you for your interest in and support of LLMLingua. Currently, since API models do not provide log probabilities for the prompt end, it's challenging to directly support...

> Same need here. I love the concepts of `LLMLingua` and they are super useful for users, however, I do not have the ability to self-host inference for any model...

Thank you @samvanity for the clarification; that's correct. Hi @Avkashhirpara, you can switch the kernel environment using different 'device_map' settings by following @samvanity's. Hi @JiHa-Kim, I think this error might...

Hi @pathquester, thank you for your support of LLMLingua. In the current implementation, the latency of quantization models is not significantly different from that of full-precision models; it might even...

Hi @pathquester, Thanks to the efforts of the community, `phi-2` is now available for use in LLMLingua. Before using it, please update your transformers to the GitHub version by running...

Yeah, you can also try to use the GPTQ version like `TheBloke/phi-2-dpo-GPTQ`.

Hi @pathquester, based on our experience, even GPT2-small can achieve satisfactory results with moderate compression rates.

Hi @XiaoFengbing, thank you for your interest in LLMLingua. I'll briefly answer your question: 1. You can consider the control coefficient parameter 'k' defined in the paper as equivalent to...