ELM icon indicating copy to clipboard operation
ELM copied to clipboard

Train detail and GPU hours

Open zc-zhao opened this issue 10 months ago • 1 comments

Great work! I have some questions to ask you. I would like to know your training details during the pre-train phase, such as the amount of training data and training configuration. Can you tell me the GPU hours for pre-train and fine-tuned? It seems that you are only using the front view video and not the multi view. I don't know if I understand correctly

zc-zhao avatar Apr 25 '24 15:04 zc-zhao

Hello @zc-zhao, our training dataset consists of 9 million pieces of data sourced from the open world, which can refer to the Tab. 2 in the paper. The training configuration closely resembles fine-tuning. We conducted pre-training using 24 A100 GPUs for over 60 hours.

Indeed, we solely utilize front-view videos, as the majority of collected videos from the open world are primarily front-facing.

DevLinyan avatar Apr 26 '24 01:04 DevLinyan