habitat-lab EmbodiedQA baseline in Habitat

❓ Questions and Help

Hello everyone,

I have been working on implementing a primarily IL-based EQA baseline in Habitat with guidance from @dhruvbatra.

I have chosen to implement an EQA baseline using the approach shared in EQA's first paper (CVPR 2018; link) and have used MP3D scene and task datasets for the same.

The proposed approach includes 4 stages of training -

Training a feature extractor (#430)
Training the VQA model (#487)
Training the NAVigation model (#539)
Training for EQA by finetuning the NAV model

I have been able to complete the first 3 stages. The forked repo can be found here (eqa-1 branch). More information about the implementation can be found in the README.

Due to the infeasibility of loading each scene for each episode during training, one of the key design decisions was to cache each episode's images beforehand by storing them on the disk. Even though this might not seem to be the most elegant approach, loading each scene for each episode will severly tank training speed. Also, since all of the first 3 trainers are going to use this data, caching by writing to the disk seemed to be the best option.

I would like the community to share their feedback about the whole implementation. Since there are multiple levels, we can start of with the first two (feature extractor and vqa). Please let me know if I can share any more information about the aforementioned.

Thank you.

Apr 14 '20 21:04 mukulkhanna

Hi @mukulkhanna, That looks great, please, send it as draft PR that we can give more grounded to code feedback. Regarding loading data to the disk, that may makes sense for IL. Usually we load one house per worker and switch scenes only after all episodes from current scenes were passed.

Apr 16 '20 18:04 mathfac

@mathfac Thank you for the response. I will open a draft PR for that.

Should I close this issue or let it be open till the PR is resolved?

Usually we load one house per worker and switch scenes only after all episodes from current scenes were passed.

I thought of using that method too, but in a single-worker IL setting like mine, I felt that would impinge on the training process. I believe it would be better if the scenes (and corresponding frames) are shuffled; that should avoid overfitting and generalise better.

Apr 16 '20 18:04 mukulkhanna

All the PR's mentioned in the scope of this Issue have already been merged.

Sep 01 '22 12:09 rpartsey

habitat-lab habitat-lab copied to clipboard

EmbodiedQA baseline in Habitat

❓ Questions and Help

habitat-lab
habitat-lab copied to clipboard