keras-nlp icon indicating copy to clipboard operation
keras-nlp copied to clipboard

high-level generation workflows (e.g. a LM class) on T5 (and Flan T5) is missing.

Open debrupf2946 opened this issue 1 year ago • 4 comments

In Keras NLP t5 model the architecture and weights are present, but HL workflows are missing

I would like to contribute a high-level masked language modeling workflow in the t5 model

This Idea is suggested by @fchollet in the Keras-user group

debrupf2946 avatar Jan 25 '24 13:01 debrupf2946

Dear maintainers, Please assign me this issue I am working on it.

debrupf2946 avatar Jan 25 '24 13:01 debrupf2946

Thanks @debrupf2946 you are assigned!

mattdangerw avatar Jan 29 '24 22:01 mattdangerw

@mattdangerw can you please help me with setting up the project **Environment:**Linux I installed from contributing.MD (CPU only)

Got following message in Terminal

Successfully built keras-nlp-0.7.0.tar.gz and keras_nlp-0.7.0-py3-none-any.whl
Build successful. Wheel file available at /home/debrup/keras-nlp/dist/keras_nlp-0.7.0-py3-none-any.whl
Installing wheel file.
Processing ./dist/keras_nlp-0.7.0-py3-none-any.whl
Installing collected packages: keras-nlp
Successfully installed keras-nlp-0.7.0

But in VS-Code when I am running a particular file I get this error

(keras-nlp-cpu) (base) debrup@debrup-2946:~/keras-nlp$ python keras_nlp/models/t5/t5_tokenizer.py
2024-02-11 08:57:56.931437: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
  File "/home/debrup/keras-nlp/keras_nlp/models/t5/t5_tokenizer.py", line 16, in <module>
    from keras_nlp.api_export import keras_nlp_export
ModuleNotFoundError: No module named 'keras_nlp.api_export'

Can you please help me out,its my first contribution and I really want to make it

debrupf2946 avatar Feb 10 '24 23:02 debrupf2946

@mattdangerw @fchollet I have made t5_preprocessor file which adds padding to the sentence packs(input id and padding) I have also tested works fine. I am working on the t5_MaskedLMMask Generator I am having a problem here what should be the mask token Id There is no mask-token-id in the t5 tokenizer file. can you help me with some ideas how to handle the mask token in t5

Should I create a pull request for t5_preprocessor for review ,I am completed with it

debrupf2946 avatar Feb 18 '24 09:02 debrupf2946