tfx
tfx copied to clipboard
Will it be possible that tfx.transform support NLP text processing directly?
NLP text processing includes tokenization,vocabulary generation,string2ids encoding,ids2string decoding. For now, I have to convert string to ids manually before I send the data to tfx pipeline. As you may expect, this process will cause training/serving skew problem. I'm wondering if it's possible to do this in the near future release or in current release which I'm not aware of. Thanks a lot in advance!
@zoyahav Can you PTAL? Thanks!
Hi, we have a example for vocabulary in TFT
Hi, we have a example for vocabulary in TFT
Yep, This is vocabulary for String2ID. However it's not enough for NLP's String2IDs encoding. I checked tensorflow transform and tensorflow Text, it seems that they are still seperate library.
I have the same question, like variable inputs of sequence, like index2label, index2vocab when predict, like filter useless character.
@zoyahav Can you PTAL. Thanks!
tf.text can be used within the tf.transform preprocessing_fn
, as well as tf.transform's vocabulary functionality (tft.compute_and_apply_vocabulary
, tft.vocabulary
, etc.).
Is there a particular functionality that you're looking for with tf.transform that you haven't been able to use?
@yynil Heres the example where you can preprocess text using Tensorflow Transform. Can you please respond to the above comment so we can take the discussion forward. Thanks
Closing as stale. Please reopen if you'd like to work on this further.