LLMLingua icon indicating copy to clipboard operation
LLMLingua copied to clipboard

[Question] Compressor fine-tune

Open alexandreteles opened this issue 1 year ago • 1 comments

Describe the issue

Greetings,

Are there any plans on releasing instructions or at least the dataset format so we can fine-tune the llmlingua-2-xlm-roberta-large-meetingbank or the base xlm-roberta-large into a custom compressor? If not, can you at least give some general instructions on how could we approach this issue?

Of course having a pipeline ready to simply plug the data and fine-tune the models would be amazing for simplicity sake, but it would be nice if we had more generalist and practical information on the process.

Thank you!

alexandreteles avatar Mar 22 '24 13:03 alexandreteles

Hi @alexandreteles, thank you for your interest in our project.

In fact, we have released the entire data collection pipeline and scripts at https://github.com/microsoft/LLMLingua/tree/main/experiments/llmlingua2/data_collection. You can define your own compressor based on this. Just due to the review process, the open-sourcing of the dataset has been delayed. Once it's approved, we will release it at this HF link.

iofu728 avatar Mar 22 '24 15:03 iofu728