data-prep-kit icon indicating copy to clipboard operation
data-prep-kit copied to clipboard

On-boarding Multimodal transforms to DPK

Open shahrokhDaijavad opened this issue 10 months ago • 3 comments

Search before asking

  • [x] I searched the issues and found no similar issues.

Component

Transforms/Other

Feature

There was work done in IBM on Multi-modal and multi-lingual transforms. Need to coordinate with the owners (Dhiraj, Pengyuan, Rogerio, Juergen) to bring these transforms to DPK.

Are you willing to submit a PR?

  • [ ] Yes I am willing to submit a PR!

shahrokhDaijavad avatar Feb 06 '25 01:02 shahrokhDaijavad

Some initial thoughts on priority transforms:

  • People Detect + Face Blur
  • NSFW (Not Safe for Working)
  • p2j (allows conversion from parquet to json with custom fields including blurred image write-out)

Question: Should current j2p become llava2parquet and p2j become parquet2llava, assuming we want to stick with llava? cc: @daw3rd

shahrokhDaijavad avatar Mar 03 '25 19:03 shahrokhDaijavad

Summary of discussion with Dhiraj and Pengyuan on 4/22

The key to bringing the multimodal transforms from inner to outer is to rewrite the abstract layer code here that the individual transforms use. This code was developed by @daw3rd and we would need David's help or someone in the DPK team to rewrite this code. Once that is done, it is relatively simple to get the in the nfsw (done by Michele) or p2j (done by Pengyuan) to come to the open DPK.

cc: @touma-I

shahrokhDaijavad avatar Apr 22 '25 18:04 shahrokhDaijavad

PR #1278

swith005 avatar Jun 24 '25 19:06 swith005