nuplan-devkit icon indicating copy to clipboard operation
nuplan-devkit copied to clipboard

Error in using s3 storage when data_loader.num_workers >0 at training stage

Open william-gx opened this issue 3 years ago • 1 comments

Hi,

I meet the following problem, could you please help to solve it?

At training stage, it sometimes reports the ResponseParseError when using cached data that is storeged in aws s3 with data_loader.params.num_workers >0. If data_loader.params.num_workers = 0, it seems ok but at the cost of running speed?

截屏2022-08-09 上午10 48 55

william-gx avatar Aug 09 '22 02:08 william-gx

Hi @william-gx,

Let us take a look at this. In the meantime, can you share with us the command you used and how you are accessing the data from S3?

patk-motional avatar Aug 18 '22 02:08 patk-motional

hi @patk-motional ,

The used command is provided below, and s3 is accessed with the provided code in this framework (boto3). Thanks.

python nuplan/planning/script/run_training.py experiment_name=vector_experiment py_func=train
+training=training_vector_model
scenario_builder=nuplan_mini
scenario_filter=training_scenarios
lightning.trainer.params.max_epochs=80
data_loader.params.batch_size=32
data_loader.params.num_workers=16
data_loader.params.pin_memory=False
cache.use_cache_without_dataset=True
cache.cache_path='s3://xxx/'
+scenario_builder.db_files='s3://xxx/'
worker.threads_per_node=2
+lightning.hparams.learning_rate=1e-5 \

william-gx avatar Oct 10 '22 08:10 william-gx