LibreChat icon indicating copy to clipboard operation
LibreChat copied to clipboard

Enhancement: Adding embedding and fine-tuning for training

Open onigetoc opened this issue 2 years ago • 4 comments

Contact Details

No response

What features would you like to see added?

Implementing embedding and fine-tuning for training.

  • https://platform.openai.com/docs/guides/embeddings
  • https://platform.openai.com/docs/guides/fine-tuning

It's also mean file uploading to openai for training. From a backend setting. It may also be adding, uploading files from front end for user with drag n drop and conventional input file uploading.

More details

  • https://platform.openai.com/docs/guides/embeddings
  • https://platform.openai.com/docs/guides/fine-tuning

Which components are impacted by your request?

No response

Pictures

No response

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

onigetoc avatar Sep 08 '23 16:09 onigetoc

Thanks for the request. I agree I think this would be a really welcome feature. I'll keep this in mind as I integrate file support (retrieval augmented generation).

danny-avila avatar Sep 09 '23 19:09 danny-avila

May be with langchain plugin or not. I think it's already exist: https://js.langchain.com/docs/modules/data_connection/text_embedding/ but i didn't find for Fine-Tunning.

May be as text, files and jsonL / json line. I do not know if Openai only accept text and jsonL? i though to create something to convert any files to text and any json to jsonL but not really sure.

onigetoc avatar Sep 10 '23 01:09 onigetoc

Could the backend use a pip package to prepare the embeddings? I would vote for a local embedding model to keep the documents private and reduce costs. It might be a reliable and consistent alternative to asking the current model for the conversation title.

INSTRUCTOR (Instruction-based Omnifarious Representations) 👨‍🏫 "Embeddings tailored to any task" One Embedder, Any Task: Instruction-Finetuned Text Embeddings Also, this relates to File support: vector indexing & retrieval project item.

UPD: I just finished listening to the publication. I looked for models to find the "-large" model had ~8x more DLs last month but only 3x larger at 1.34 GB 194,913

rgresock avatar Jan 10 '24 20:01 rgresock