deepmind-research icon indicating copy to clipboard operation
deepmind-research copied to clipboard

question about reading dataset

Open samas69420 opened this issue 3 years ago • 2 comments

hi, i would like to read the waterdrop dataset in "learning to simulate" repo and print individual tensors for each particle in a given timestep but i'm pretty new to tensorflow and reading a tfrecord is a little trickier than how i expected, how can i do it?

samas69420 avatar Jun 02 '22 17:06 samas69420

Hi! After downloading the dataset, have you tried reading the TFRecord, and then printing out its contents after decoding it? (I found this example in the TF docs)

kevroi avatar Oct 02 '22 06:10 kevroi

I'm using the following code to read the TFRecord file. However, I do not get a reasonable list of floats that represents the data I expect (positions of particles at all timesteps of the ground truth simulation).

#Inspect the data in the WaterDrop reference dataset
file_path = '/tmp/datasets/WaterDrop/test.tfrecord'

ds_waterdrop = tf.data.TFRecordDataset(file_path)

feature_description = {
    'key': tf.io.FixedLenFeature([], tf.int64),
    'particle_type': tf.io.FixedLenFeature([], tf.string)
}

def _parse_function(example_proto):
    return tf.io.parse_single_example(example_proto, feature_description)

ds_waterdrop = ds_waterdrop.map(_parse_function)

#Create empty lists to store values
keys = []
particle_types = []
waterdrop_array = np.zeros((2, ))
i = 0
for record in ds_waterdrop:
    feature_value_1 = record['key'].numpy()
    feature_value_2_bytes = record['particle_type'].numpy()
    feature_value_2_iter = struct.iter_unpack('<f',  feature_value_2_bytes)

    feature_value_2_list = list(feature_value_2_iter)

I am thus also still wondering how to decode this file properly. I am also wondering why there are 30 instances of the features 'key' and 'particle_type' when from the render_rollout I conclude that the test dataset uses a set of 285 particles over 1000 timesteps.

basoomen avatar Oct 26 '23 08:10 basoomen