waymo-open-dataset
waymo-open-dataset copied to clipboard
Missing Map Features in Testing Scenario Proto Dataset
Hello,
While trying to process map features from the Scenario Proto dataset, I found that some test scenarios have no map features. Here is the list of these scenarios:
No. | Scenario ID |
---|---|
1 | a869787ec83d5c8a |
2 | 8738e8c0200056fe |
3 | 9465fb15b456855a |
4 | 4c99903f949e8100 |
5 | ff6686b0e98d66ae |
6 | 9036b3f956b09fc0 |
7 | 981f5c4f61505759 |
8 | 38e2986c8098692f |
9 | ab7571715a1d5193 |
The environment I am using:
python==3.10.13
conda==23.11.0
pip==23.3.2
numpy==1.21.5
pandas==1.5.3
tensorflow==2.11.0
torch==2.1.0+cu118
waymo-open-dataset-tf-2-11-0 1.6.1
The code to reproduce the issue:
from pathlib import Path
import tensorflow as tf
from tqdm import tqdm
DATA_ROOT = str(Path("../data/womd/raw/scenario/").resolve())
TRAIN_FILES = os.path.join(DATA_ROOT, "training", "training.tfrecord*")
VALID_FILES = os.path.join(DATA_ROOT, "validation", "validation.tfrecord*")
TEST_FILES = os.path.join(DATA_ROOT, "testing", "testing.tfrecord*")
filenames = tf.io.matching_files(TEST_FILES)
dataset = tf.data.TFRecordDataset(filenames)
for data in tqdm(
dataset.as_numpy_iterator(),
total=NUM_DATA_MAP[split],
desc=f"Traversing {split} data",
):
scenario = scenario_pb2.Scenario.FromString(data)
if len(scenario.map_features) == 0:
print(f"Scenario {scenario.scenario_id} has no map features.")
continue
The dataset version is 1.2.0, which I downloaded from here. Could you help confirm if the files under scenario/testing
on the Google Cloud servers are correct? Thanks!
Adding @scott-ettinger
If I'm not mistaken, this was flagged some time ago and it seems to be a small issue on our side (@ChocolateDave can you please confirm that it's just these 9 Scenarios?). We'll look into fixing that extraction error or just discarding those examples in the upcoming release.
Thanks for flagging!
Thanks for the response, @nicomon24.
I confirm that these are the only nine scenarios missing from the testing dataset. I have double-checked the issue by running codes on both dataset versions 1.1.0
and 1.2.0
.
But it seems there are also missing cases in the testing-interactive
scenarios:
No. | Scenario ID | Missing from testing |
---|---|---|
1 | e1f412d402676e57 | |
2 | 9e0ed12773f813eb | |
3 | ff6686b0e98d66ae | ✓ |
4 | 981f5c4f61505759 | ✓ |
5 | 664537fc3819c08a | |
6 | b6f042d4029a0297 | |
7 | 1ebe3d70cb05a381 | |
8 | ab7571715a1d5193 | ✓ |
9 | 5a99e6200deb4792 |
Thanks for flagging this.
@nicomon24
Sorry, so how we deal with this when preparing submission for test set? Should we just skip those scenarios?
@pengzhenghao For sim agents yes, these are not in the test set so you can safely skip them. For motion I need to check with Scott