NExT-QA How to get bert_ft.h5 for myself dataset

Hello, I want to ask you another question, how to get bert_ft.h5 for myself dataset? how to encode qusetion and answer with bert? Are they separate or together ? thanks!

Jun 20 '21 14:06 wangbq18

Hi, please refer to the link given in readme & our paper. Answers are appended behind the corresponding question for multi-choice QA.

Jun 20 '21 16:06 doc-doc

Hi, please refer to the link given in readme & our paper. Answers are appended behind the corresponding question for multi-choice QA.

OK, I see. a another question, how to get motion feature with a shape as (16, 2048). With code provided by [HCRN], the motion feature shape is (8, 2048) with 8 clips, Dose that mean I should set clips=16? And your paper said the best performance is from using ResNet as an appearance feature along with I3D ResNeXt as a motion feature (Res+I3D), How to get I3D feature. Can you share the code?

Jun 21 '21 10:06 wangbq18

Hi, we use I3D with ResNeXt as backbone to capture motion info. The code can also be found in HCRN. The number of sampled clips depends on your dataset, usually ranges from 8~32..

Jun 21 '21 15:06 doc-doc

Hi, we use I3D with ResNeXt as backbone to capture motion info. The code can also be found in HCRN. The number of sampled clips depends on your dataset, usually ranges from 8~32..

Thanks a lot, I have solved the problem above. There is no HCRN model implementation base on bert, I try to implementation, but When I repalce glove with bert, It doesn't convergence. Can you share the code?

Jun 25 '21 08:06 wangbq18

You need to finetune BERT on your own dataset, and then extract token representations for sentences. Afterwards, you can use the extracted BERT features to replace the GloVe embedding layer in HCRN. You can learn from NExT-QA (this repo.) on how to replace GloVe with BERT features. We are not going to release this part of work so far..

Jun 25 '21 11:06 doc-doc

Hi, we use I3D with ResNeXt as backbone to capture motion info. The code can also be found in HCRN. The number of sampled clips depends on your dataset, usually ranges from 8~32..

Thanks a lot, I have solved the problem above. There is no HCRN model implementation base on bert, I try to implementation, but When I repalce glove with bert, It doesn't convergence. Can you share the code?

Hi, HCRN-BERT implementation is available here.

Nov 17 '21 11:11 doc-doc

Hi, we have released the edited code for fintuning BERT on NExT-QA here. You can also fine-tune other datasets by using the code.

Aug 01 '22 03:08 doc-doc

Hi, we have released the edited code for fintuning BERT on NExT-QA here. You can also fine-tune other datasets by using the code.

Hi, this link has expired, can you provide it again?

Apr 30 '23 07:04 PolarisHsu

Yes. Please download it via this link.

Apr 30 '23 11:04 doc-doc