tips icon indicating copy to clipboard operation
tips copied to clipboard

Which text encoding model you are using in this code?

Open crazySyaoran opened this issue 1 year ago • 5 comments

In your paper, it says

At first, we encode TB into an embedded vector vB either by many-hot encoding or using a pre-trained NLP model such as BERT, FastText, or Word2Vec

Can you tell me exactly which text encoding model you are using in your released code? Could you release the encoding model for custom images?

crazySyaoran avatar Dec 06 '22 06:12 crazySyaoran

We use many-hot encoding in our code as it provides a straightforward way to encode the text for our application. However, BERT-encoded text also provides comparable results. We have tested with a pre-trained BERT model uncased_L-24_H-1024_A-16 from the following repository - https://github.com/google-research/bert#pre-trained-models.

prasunroy avatar Dec 06 '22 09:12 prasunroy

Thanks a lot, it was a great help. I will try it soon.

crazySyaoran avatar Dec 06 '22 09:12 crazySyaoran

Hi I noticed that the encoding length in encodings.csv is 84, while the output of BERT from your provided url is (61,1024). I urgently need to reproduce your results from custom input text. Could you release your many-hot encoding model mentioned above? or could you release the code suits the BERT's encoding shape?

crazySyaoran avatar Dec 07 '22 07:12 crazySyaoran

The many-hot encoding was manually collected during data annotation. So, we do not have a model to infer this encoding directly from the image. It needs to be done manually as an interactive user input. In the case of a frozen text encoder, such as BERT, you need to consider the output from the last hidden layer. The final hidden layer output can be projected to the target shape through another linear layer if required. In our experiments, we tested with BERT encoding of length 384. Also, note that for any specific encoding type and/or length, the stage-1 network needs to be retrained.

Check the following resources on text encoding with BERT. [1] https://medium.com/future-vision/real-time-natural-language-understanding-with-bert-315aff964bfa [2] https://github.com/NVIDIA/TensorRT/tree/main/demo/BERT

However a more recent and currently recommended way is to use Hugging Face Transformers. https://github.com/huggingface/transformers

A demo of TIPS with BERT encoding is (temporarily) available at https://drive.google.com/file/d/1Jsms6hPKg6ESrJyRTdwkgKScIKow1RnU

prasunroy avatar Dec 08 '22 21:12 prasunroy

Thanks for the reply, but I didnt find the BERT encoding of length 384 you mentioned above in hugging face. Berts I found in hugging face are:

  H=128 H=256 H=512 H=768
L=2 2/128 (BERT-Tiny) 2/256 2/512 2/768
L=4 4/128 4/256 (BERT-Mini) 4/512 (BERT-Small) 4/768
L=6 6/128 6/256 6/512 6/768
L=8 8/128 8/256 8/512 (BERT-Medium) 8/768
L=10 10/128 10/256 10/512 10/768
L=12 12/128 12/256 12/512 12/768 (BERT-Base)

from https://huggingface.co/google/bert_uncased_L-2_H-768_A-12

Could you please tell me where I can get your pretrained BERT encoding of length 384 ?

crazySyaoran avatar Dec 09 '22 09:12 crazySyaoran