waymo-open-dataset
waymo-open-dataset copied to clipboard
Overlapping Segments
Hey all,
"The dataset is composed of 103,354
segments each containing 20 seconds of object tracks at 10Hz and map data for the area covered by the segment. These segments are further broken into 9 second windows with 5 second overlap"
When downloading the files, I thought training_20s
contains every 20 second segment. However, I counted only 70543
.
If I want to solely work with segments that do not overlap, how do I do that?
Greetings
The segments in training_20s are non-overlapping 20 second long segments. There are only 70543 because these are only the run segments that make up the training set (70%). The test and validation sets are not released in the 20s format.
@scott-ettinger Could you release the scenario/validation_20s
?
Thanks for the feedback. We will consider this for future releases.
Hi, @MoritzNekolla
May I ask how do you get the number 70543? Thanks in advance.
@ShoufaChen I loaded each file among training_20s
folder and counted the scenarios inside.
Here is my code (~30min runtime):
FILENAME = "/yourPathToDataset/waymo/motion/scenario/training_20s/"
path_list = [] # lists all tfrecord files inside training_20s
for root, dirs, files in os.walk(os.path.abspath(FILENAME)):
for file in files:
path_list.append(os.path.join(root, file))
# function to unpack the recorded data
def load_Data(filename):
dataset = tf.data.TFRecordDataset(filename)
scenario_data = []
for data in dataset:
proto_string = data.numpy()
proto = scenario_pb2.Scenario()
proto.ParseFromString(proto_string)
scenario_data.append(proto)
return scenario_data
# count the number of scenarios
num_scenario = 0
for k in path_list:
scenario_data = load_Data(k)
num_scenario = num_scenario + len(scenario_data)
print(f"{num_scenario} scenarios") # prints "70543 scenarios"
Thanks for your reply @MoritzNekolla .