tfx icon indicating copy to clipboard operation
tfx copied to clipboard

Will it be possible that tfx.transform support NLP text processing directly?

Open yynil opened this issue 5 years ago • 7 comments

NLP text processing includes tokenization,vocabulary generation,string2ids encoding,ids2string decoding. For now, I have to convert string to ids manually before I send the data to tfx pipeline. As you may expect, this process will cause training/serving skew problem. I'm wondering if it's possible to do this in the near future release or in current release which I'm not aware of. Thanks a lot in advance!

yynil avatar Feb 13 '20 10:02 yynil

@zoyahav Can you PTAL? Thanks!

gowthamkpr avatar Feb 13 '20 21:02 gowthamkpr

Hi, we have a example for vocabulary in TFT

1025KB avatar Feb 13 '20 23:02 1025KB

Hi, we have a example for vocabulary in TFT

Yep, This is vocabulary for String2ID. However it's not enough for NLP's String2IDs encoding. I checked tensorflow transform and tensorflow Text, it seems that they are still seperate library.

yynil avatar Feb 14 '20 01:02 yynil

I have the same question, like variable inputs of sequence, like index2label, index2vocab when predict, like filter useless character.

yongzhuo avatar Feb 20 '20 07:02 yongzhuo

@zoyahav Can you PTAL. Thanks!

gowthamkpr avatar Mar 03 '20 19:03 gowthamkpr

tf.text can be used within the tf.transform preprocessing_fn, as well as tf.transform's vocabulary functionality (tft.compute_and_apply_vocabulary, tft.vocabulary, etc.). Is there a particular functionality that you're looking for with tf.transform that you haven't been able to use?

zoyahav avatar Aug 16 '21 15:08 zoyahav

@yynil Heres the example where you can preprocess text using Tensorflow Transform. Can you please respond to the above comment so we can take the discussion forward. Thanks

gowthamkpr avatar Aug 05 '22 17:08 gowthamkpr

Closing as stale. Please reopen if you'd like to work on this further.

gowthamkpr avatar Aug 19 '22 17:08 gowthamkpr