OpenPCDet
OpenPCDet copied to clipboard
Training on Waymo seems not to lead to convergence
Very great work! It is fast in inference and light-weighted.
When I train the model using waymo (after processing it using your codes), the loss seems not to be decreasing. I am not sure where could go wrong.
I made two major modifications:
- when processing the waymo dataset, the original code is doing some mapping, but it returned errors, so I changed it to the following format.
def process_single_sequence(sequence_file, save_path, sampled_interval, client, has_label=True, use_two_returns=True):
sequence_name = os.path.splitext(os.path.basename(sequence_file))[0]
# print('Load record (sampled_interval=%d): %s' % (sampled_interval, sequence_name))
if not client.exists(sequence_file):
print('NotFoundError: %s' % sequence_file)
return []
# dataset = tf.data.TFRecordDataset(client._map_path(sequence_file), compression_type='')
dataset = tf.data.TFRecordDataset(str(sequence_file), compression_type='')
cur_save_dir = save_path / sequence_name
cur_save_dir.mkdir(parents=True, exist_ok=True)
- for dist_train.sh, I changed it to be the same format as OpenPCDet:
#!/usr/bin/env bash
set -x
NGPUS=$1
PY_ARGS=${@:2}
echo "#######################################" $PY_ARGS
while true
do
PORT=$(( ((RANDOM<<15)|RANDOM) % 49152 + 10000 ))
status="$(nc -z 127.0.0.1 $PORT < /dev/null &>/dev/null; echo $?)"
if [ "${status}" != "0" ]; then
break;
fi
done
echo $PORT
python3 -m torch.distributed.launch --nproc_per_node=${NGPUS} --master_port $PORT train.py --launcher pytorch ${PY_ARGS}
The training log is attached.
I am trying to train it again using the original dist_train.sh, but it still does not show a trend for convergence. train-waymo-pvt-ssd.log