datasets Add Quickdraw Sketch RNN Dataset

dataset_info.json

Add the Quickdraw Dataset used to train Sketch RNN.

Close one of the TODOs of #337

Caveats:

I took the liberty and created a sequence directory to hold this data
I added the test but it is still red
Manually testing the dataset from TF worked as intended

Mar 27 '19 09:03 mr-ubik

Thanks for dataset.

You need to add fake_examples.

Mar 27 '19 10:03 us

I am also changing the number of shards per split going from 1-1-1 to 20-5-5, I will update the gist accordingly.

Mar 27 '19 10:03 mr-ubik

@us How should fake examples work? Are they simply placeholders or should I add 3 complete .npz files?

Mar 27 '19 10:03 mr-ubik

You can add .npz files to default. You should think like a that's your extracted dir. Also you can give the _split_generators outputs details

Mar 27 '19 11:03 us

Updated the dataset_info.json with the correct BibTeX citation.

Mar 28 '19 09:03 mr-ubik

One thing I have seen during testing is that if we want to be able to reproduce the technique of the original paper we should probably invert the notation for the end of the drawing. In the original paper, the authors pad each sketch with (0,0,0,0,1), if we want to leverage the power of the tf.data.Dataset "pipeline" it could be more useful to use (0,0,0,0,0) as the end stroke, doing this allows using tf.padded_batch and potentially tf.data.experimental.bucket_by_sequence_length

Apr 02 '19 14:04 mr-ubik

One thing I have seen during testing is that if we want to be able to reproduce the technique of the original paper we should probably invert the notation for the end of the drawing. In the original paper, the authors pad each sketch with (0,0,0,0,1), if we want to leverage the power of the tf.data.Dataset "pipeline" it could be more useful to use (0,0,0,0,0) as the end stroke, doing this allows using tf.padded_batch and potentially tf.data.experimental.bucket_by_sequence_length

Contrary to what said above I have moved the padding step to the examples generation, the final user will thus just need to use tf.data.Dataset.filter() to filter for the various labels.

Apr 09 '19 14:04 mr-ubik

I have modified the preprocess step adding the stroke signaling the start of a sketch as they do in here.

EDIT: :thinking: apparently the py2-tf2 test is failing. Also updated the dataset info since I have increased the training set shards from 20 to 30.

Apr 18 '19 10:04 mr-ubik

Fixed an error in the padding function. The TF 2 Py2.7 Test will fail due to a known Keras issue. https://stackoverflow.com/a/55903975/8050556

May 07 '19 09:05 mr-ubik

@mr-ubik still continue?

Jul 17 '19 14:07 us

I had stopped due to the issue with NumPy and Keras I had referenced earlier while I have been working on making sure the data format and pre-processing were accurately reflecting the one done in the paper.

Jul 21 '19 10:07 mr-ubik

Did you open an issue by tagging this pr?

Jul 21 '19 19:07 us

The issue should be fixed now. I will look at the code next week and start pushing new updates again. :heart:

Jul 24 '19 11:07 mr-ubik

Okay! It'll be awesome :)

Jul 24 '19 11:07 us

@mr-ubik hey don't forget !

Jul 29 '19 22:07 us

@us Just sorting through issues at work, contributions will resume ASAP

Jul 31 '19 10:07 mr-ubik

Hi there! This PR looks great, is it still active?

Mar 21 '20 06:03 ageron

Hi @ageron! I actually had to stop working on it due to other priorities, but I'd like to resume it. It should probably be updated with the new API of tfds if I am not mistaken. There's also a discussion to be had on whether to embed the preprocessing done in Sketch-RNN into tfds pipeline or leaving it up to the user.

In an internal fork of tfds we are working on at @zurutech/ml we are going to try and see if we can implement this kind of behavior via the use of several BUILDER_CONFIG; if this work out it could be used for this dataset as well.

Jul 17 '20 08:07 mr-ubik

is this PR active

Jul 16 '22 22:07 osbm

datasets datasets copied to clipboard

Add Quickdraw Sketch RNN Dataset

Caveats:

datasets
datasets copied to clipboard