[Question] Compressor fine-tune
Describe the issue
Greetings,
Are there any plans on releasing instructions or at least the dataset format so we can fine-tune the llmlingua-2-xlm-roberta-large-meetingbank or the base xlm-roberta-large into a custom compressor? If not, can you at least give some general instructions on how could we approach this issue?
Of course having a pipeline ready to simply plug the data and fine-tune the models would be amazing for simplicity sake, but it would be nice if we had more generalist and practical information on the process.
Thank you!
Hi @alexandreteles, thank you for your interest in our project.
In fact, we have released the entire data collection pipeline and scripts at https://github.com/microsoft/LLMLingua/tree/main/experiments/llmlingua2/data_collection. You can define your own compressor based on this. Just due to the review process, the open-sourcing of the dataset has been delayed. Once it's approved, we will release it at this HF link.