data-prep-kit
data-prep-kit copied to clipboard
[Feature] Enhance code2parquet to support instruction tuning pairs as an input for data prep
Search before asking
- [X] I searched the issues and found no similar issues.
Component
Tools/ingest2parquet
Feature
Ability to read instruction pairs with the assumption that they are in JSON format.
Are you willing to submit a PR?
- [X] Yes I am willing to submit a PR!
We need to enhance the transform such that every instruction pair becomes one row in the parquet files.
this would now be appled to code2parquet which superseded ingest2parquet which is not deprecated.