waymo-open-dataset icon indicating copy to clipboard operation
waymo-open-dataset copied to clipboard

Overlapping Segments

Open MoritzNekolla opened this issue 2 years ago • 6 comments

Hey all,

"The dataset is composed of 103,354 segments each containing 20 seconds of object tracks at 10Hz and map data for the area covered by the segment. These segments are further broken into 9 second windows with 5 second overlap"

When downloading the files, I thought training_20s contains every 20 second segment. However, I counted only 70543. If I want to solely work with segments that do not overlap, how do I do that?

Greetings

MoritzNekolla avatar Jan 28 '22 11:01 MoritzNekolla

The segments in training_20s are non-overlapping 20 second long segments. There are only 70543 because these are only the run segments that make up the training set (70%). The test and validation sets are not released in the 20s format.

s-ettinger avatar Feb 04 '22 06:02 s-ettinger

@scott-ettinger Could you release the scenario/validation_20s?

yenw avatar Feb 16 '22 16:02 yenw

Thanks for the feedback. We will consider this for future releases.

s-ettinger avatar Feb 18 '22 03:02 s-ettinger

Hi, @MoritzNekolla

May I ask how do you get the number 70543? Thanks in advance.

ShoufaChen avatar Apr 05 '22 06:04 ShoufaChen

@ShoufaChen I loaded each file among training_20s folder and counted the scenarios inside. Here is my code (~30min runtime):

FILENAME = "/yourPathToDataset/waymo/motion/scenario/training_20s/"
path_list = [] # lists all tfrecord files inside training_20s
for root, dirs, files in os.walk(os.path.abspath(FILENAME)):
    for file in files:
        path_list.append(os.path.join(root, file))

# function to unpack the recorded data
def load_Data(filename): 
    dataset = tf.data.TFRecordDataset(filename)
    scenario_data = []
    for data in dataset:
        proto_string = data.numpy()
        proto = scenario_pb2.Scenario()
        proto.ParseFromString(proto_string)
        scenario_data.append(proto)

    return scenario_data

# count the number of scenarios
num_scenario = 0
for k in path_list:
    scenario_data = load_Data(k)
    num_scenario = num_scenario + len(scenario_data)

print(f"{num_scenario} scenarios") # prints "70543 scenarios"

MoritzNekolla avatar Apr 07 '22 15:04 MoritzNekolla

Thanks for your reply @MoritzNekolla .

ShoufaChen avatar Apr 08 '22 02:04 ShoufaChen