Results 40 comments of Brandon Lee

Writers must be utilizing audio_data instead of reading the same sample from disk

I think `max_window_size` can be a misleading name. It is prefixed with `max` as samples can possibly have variable length (shorter than the `max_window_size`) The window is the single unit...

segfault was happening due to numba https://github.com/numba/numba/issues/4323

Spent some time applying [one of the multiprocessing package](https://pathos.readthedocs.io/en/latest/pathos.html) but the results weren't that good please refer to https://github.com/castorini/howl/tree/multi_processing_test

When writing a dataset, process function should also take in sample (AudioClipExample) and use sample.audio_data when metadata.path does not exist (https://github.com/castorini/howl/blob/master/howl/data/dataset/serialize.py#L67-L72)

I haven't generated a dataset from google speech commands for a long time. what were the commands you used and what was the target word? The augmentations are applied at...

Would this script will solve your issue? https://github.com/castorini/howl/blob/master/generate_dataset.sh Details can be found here: https://github.com/castorini/howl/tree/master/howl/dataset

Sorry for the delay. I have other stuff going on that I didn't get to spent too much time with this project. I think you just caught a bug. @bdytx5...

Any updates on this? I think this can be easily fixed if the tensor loaded through readtext are simply moved to the same device.

Pic 1.0 images tar has train and test sets (https://picdataset.com/challenge/task/download/) But based on read me, it is supposed to have .jpg without intermediate (train & test) dirs. Which one are...