Shahrokh Daijavad
Shahrokh Daijavad
@klwuibm I don't know if you are aware of the tokenization2arrow transform that is already in the outer as a PR #1033 and is about to be merged. So, what...
Thank you, @klwuibm. Your assumptions above look good to me. Also, thank you for finding the bug above and fixing it with your upcoming PR. I assigned #1139 to you.
cc: @swith005 I am just creating this issue for discussion before we implement anything.
@swith005 It is still needed, but with low priority.
As a first step, we need to understand the work that Cezar and Juergen are doing, so I propose a discussion that we can have here. @cpendus @jbross-ibm-research
Some initial thoughts on priority transforms: - People Detect + Face Blur - NSFW (Not Safe for Working) - p2j (allows conversion from parquet to json with custom fields including...
Summary of discussion with Dhiraj and Pengyuan on 4/22 The key to bringing the multimodal transforms from inner to outer is to rewrite the abstract layer code [here](https://github.ibm.com/ai-models-data/data-prep-kit-inner/blob/multimodal/transforms/multimodal/proto/python/src/abstract_mm_transform.py) that the...
Yes, @ShiroYasha18, but I am asking @dolfim-ibm for his help in doing this, so we know exactly what version to use and we can test locally, before adding to the...
@touma-I This PR is only about the output cells of the pdfprocessing in the examples folder.
@sujee As you can see from the comments by Maroun, he is asking for more than just adding the output cells that I have done.