Error in using s3 storage when data_loader.num_workers >0 at training stage
Hi,
I meet the following problem, could you please help to solve it?
At training stage, it sometimes reports the ResponseParseError when using cached data that is storeged in aws s3 with data_loader.params.num_workers >0. If data_loader.params.num_workers = 0, it seems ok but at the cost of running speed?
Hi @william-gx,
Let us take a look at this. In the meantime, can you share with us the command you used and how you are accessing the data from S3?
hi @patk-motional ,
The used command is provided below, and s3 is accessed with the provided code in this framework (boto3). Thanks.
python nuplan/planning/script/run_training.py experiment_name=vector_experiment py_func=train
+training=training_vector_model
scenario_builder=nuplan_mini
scenario_filter=training_scenarios
lightning.trainer.params.max_epochs=80
data_loader.params.batch_size=32
data_loader.params.num_workers=16
data_loader.params.pin_memory=False
cache.use_cache_without_dataset=True
cache.cache_path='s3://xxx/'
+scenario_builder.db_files='s3://xxx/'
worker.threads_per_node=2
+lightning.hparams.learning_rate=1e-5 \