data-prep-kit
data-prep-kit copied to clipboard
[Feature] Create vector embeddings
trafficstars
Search before asking
- [X] I searched the issues and found no similar issues.
Component
Transforms/Other
Feature
The goal is to add a new module that can create embeddings from a given document. The module will take input parquet files, where each row may contain a chunk or the full document. For every row, it will convert the existing text to embeddings and add the embeddings as a new column in the parquet files.
This should be added as a new module along with the other language transforms here https://github.com/IBM/data-prep-kit/tree/dev/transforms/language
Are you willing to submit a PR?
- [x] Yes I am willing to submit a PR!
Done in https://github.com/IBM/data-prep-kit/pull/461