waymo-open-dataset icon indicating copy to clipboard operation
waymo-open-dataset copied to clipboard

Missing Map Features in Testing Scenario Proto Dataset

Open juanwulu opened this issue 1 year ago • 5 comments

Hello,

While trying to process map features from the Scenario Proto dataset, I found that some test scenarios have no map features. Here is the list of these scenarios:

No. Scenario ID
1 a869787ec83d5c8a
2 8738e8c0200056fe
3 9465fb15b456855a
4 4c99903f949e8100
5 ff6686b0e98d66ae
6 9036b3f956b09fc0
7 981f5c4f61505759
8 38e2986c8098692f
9 ab7571715a1d5193

The environment I am using:

python==3.10.13
conda==23.11.0
pip==23.3.2
numpy==1.21.5
pandas==1.5.3
tensorflow==2.11.0
torch==2.1.0+cu118
waymo-open-dataset-tf-2-11-0 1.6.1

The code to reproduce the issue:

from pathlib import Path

import tensorflow as tf
from tqdm import tqdm

DATA_ROOT = str(Path("../data/womd/raw/scenario/").resolve())
TRAIN_FILES = os.path.join(DATA_ROOT, "training", "training.tfrecord*") 
VALID_FILES = os.path.join(DATA_ROOT, "validation", "validation.tfrecord*")
TEST_FILES = os.path.join(DATA_ROOT, "testing", "testing.tfrecord*")

filenames = tf.io.matching_files(TEST_FILES)
dataset = tf.data.TFRecordDataset(filenames)
for data in tqdm(
    dataset.as_numpy_iterator(),
    total=NUM_DATA_MAP[split],
    desc=f"Traversing {split} data",
):
    scenario = scenario_pb2.Scenario.FromString(data)
    if len(scenario.map_features) == 0:
        print(f"Scenario {scenario.scenario_id} has no map features.")
        continue

The dataset version is 1.2.0, which I downloaded from here. Could you help confirm if the files under scenario/testing on the Google Cloud servers are correct? Thanks!

juanwulu avatar Jan 30 '24 20:01 juanwulu

Adding @scott-ettinger

If I'm not mistaken, this was flagged some time ago and it seems to be a small issue on our side (@ChocolateDave can you please confirm that it's just these 9 Scenarios?). We'll look into fixing that extraction error or just discarding those examples in the upcoming release.

Thanks for flagging!

nicomon24 avatar Feb 01 '24 14:02 nicomon24

Thanks for the response, @nicomon24.

I confirm that these are the only nine scenarios missing from the testing dataset. I have double-checked the issue by running codes on both dataset versions 1.1.0 and 1.2.0.

But it seems there are also missing cases in the testing-interactive scenarios:

No. Scenario ID Missing from testing
1 e1f412d402676e57
2 9e0ed12773f813eb
3 ff6686b0e98d66ae
4 981f5c4f61505759
5 664537fc3819c08a
6 b6f042d4029a0297
7 1ebe3d70cb05a381
8 ab7571715a1d5193
9 5a99e6200deb4792

juanwulu avatar Feb 01 '24 16:02 juanwulu

Thanks for flagging this.

scott-ettinger avatar Feb 01 '24 18:02 scott-ettinger

@nicomon24

Sorry, so how we deal with this when preparing submission for test set? Should we just skip those scenarios?

pengzhenghao avatar Apr 10 '24 17:04 pengzhenghao

@pengzhenghao For sim agents yes, these are not in the test set so you can safely skip them. For motion I need to check with Scott

nicomon24 avatar Apr 11 '24 08:04 nicomon24