tips Which text encoding model you are using in this code?

Which text encoding model you are using in this code?

Open crazySyaoran opened this issue 1 year ago • 5 comments

In your paper, it says

At first, we encode TB into an embedded vector vB either by many-hot encoding or using a pre-trained NLP model such as BERT, FastText, or Word2Vec

Can you tell me exactly which text encoding model you are using in your released code? Could you release the encoding model for custom images?

Dec 06 '22 06:12 crazySyaoran

We use many-hot encoding in our code as it provides a straightforward way to encode the text for our application. However, BERT-encoded text also provides comparable results. We have tested with a pre-trained BERT model uncased_L-24_H-1024_A-16 from the following repository - https://github.com/google-research/bert#pre-trained-models.

Dec 06 '22 09:12 prasunroy

Thanks a lot, it was a great help. I will try it soon.

Dec 06 '22 09:12 crazySyaoran

Hi I noticed that the encoding length in encodings.csv is 84, while the output of BERT from your provided url is (61,1024). I urgently need to reproduce your results from custom input text. Could you release your many-hot encoding model mentioned above? or could you release the code suits the BERT's encoding shape?

Dec 07 '22 07:12 crazySyaoran

The many-hot encoding was manually collected during data annotation. So, we do not have a model to infer this encoding directly from the image. It needs to be done manually as an interactive user input. In the case of a frozen text encoder, such as BERT, you need to consider the output from the last hidden layer. The final hidden layer output can be projected to the target shape through another linear layer if required. In our experiments, we tested with BERT encoding of length 384. Also, note that for any specific encoding type and/or length, the stage-1 network needs to be retrained.

Check the following resources on text encoding with BERT. [1] https://medium.com/future-vision/real-time-natural-language-understanding-with-bert-315aff964bfa [2] https://github.com/NVIDIA/TensorRT/tree/main/demo/BERT

However a more recent and currently recommended way is to use Hugging Face Transformers. https://github.com/huggingface/transformers

A demo of TIPS with BERT encoding is (temporarily) available at https://drive.google.com/file/d/1Jsms6hPKg6ESrJyRTdwkgKScIKow1RnU

Dec 08 '22 21:12 prasunroy

Thanks for the reply, but I didnt find the BERT encoding of length 384 you mentioned above in hugging face. Berts I found in hugging face are:

	H=128	H=256	H=512	H=768
L=2	2/128 (BERT-Tiny)	2/256	2/512	2/768
L=4	4/128	4/256 (BERT-Mini)	4/512 (BERT-Small)	4/768
L=6	6/128	6/256	6/512	6/768
L=8	8/128	8/256	8/512 (BERT-Medium)	8/768
L=10	10/128	10/256	10/512	10/768
L=12	12/128	12/256	12/512	12/768 (BERT-Base)

from https://huggingface.co/google/bert_uncased_L-2_H-768_A-12

Could you please tell me where I can get your pretrained BERT encoding of length 384 ?

Dec 09 '22 09:12 crazySyaoran

tips tips copied to clipboard

Which text encoding model you are using in this code?

tips
tips copied to clipboard