data-prep-kit On-boarding Multimodal transforms to DPK

Search before asking

[x] I searched the issues and found no similar issues.

Component

Transforms/Other

Feature

There was work done in IBM on Multi-modal and multi-lingual transforms. Need to coordinate with the owners (Dhiraj, Pengyuan, Rogerio, Juergen) to bring these transforms to DPK.

Are you willing to submit a PR?

[ ] Yes I am willing to submit a PR!

Feb 06 '25 01:02 shahrokhDaijavad

Some initial thoughts on priority transforms:

People Detect + Face Blur
NSFW (Not Safe for Working)
p2j (allows conversion from parquet to json with custom fields including blurred image write-out)

Question: Should current j2p become llava2parquet and p2j become parquet2llava, assuming we want to stick with llava? cc: @daw3rd

Mar 03 '25 19:03 shahrokhDaijavad

Summary of discussion with Dhiraj and Pengyuan on 4/22

The key to bringing the multimodal transforms from inner to outer is to rewrite the abstract layer code here that the individual transforms use. This code was developed by @daw3rd and we would need David's help or someone in the DPK team to rewrite this code. Once that is done, it is relatively simple to get the in the nfsw (done by Michele) or p2j (done by Pengyuan) to come to the open DPK.

cc: @touma-I

Apr 22 '25 18:04 shahrokhDaijavad

PR #1278

Jun 24 '25 19:06 swith005