ELM
ELM copied to clipboard
Train detail and GPU hours
Great work! I have some questions to ask you. I would like to know your training details during the pre-train phase, such as the amount of training data and training configuration. Can you tell me the GPU hours for pre-train and fine-tuned? It seems that you are only using the front view video and not the multi view. I don't know if I understand correctly
Hello @zc-zhao, our training dataset consists of 9 million pieces of data sourced from the open world, which can refer to the Tab. 2 in the paper. The training configuration closely resembles fine-tuning. We conducted pre-training using 24 A100 GPUs for over 60 hours.
Indeed, we solely utilize front-view videos, as the majority of collected videos from the open world are primarily front-facing.