Daniel Falk
Daniel Falk
I can also add to the benchmarking that by batching the samples I reach essentially the same throughput with tensorflow (1245 frames / second) as with the hub dataset directly...
I created a simple example of how I typically use the datasets to measure the timing with and without chunked fetching, that did however result in an exception when training...
Thanks @AbhinavTuli , with that fix it works good. With the large images in my "predict weather" example above, the training of the model was ~30% faster when fetching chunks....
Thanks @mikayelh! CLA signed and you have a message in slack.
@farizrahman4u I see, that was indeed the case. When accessing the tensor it was no significant difference when using `.tensorflow()` or not. Is it however expected to be this slow?...
Thank you @farizrahman4u and @davidbuniat, that was exactly what I was looking for! I will look more into Deep Lake / Hub3 dataset and see if that works for me.
@datajoely That's probably the easiest solution. The potential issues I see with it is that then we are limited to processing videos that fits into the tmp-partition. I think that...
Yes, that's probably true. Let's make it simple now and we can think more about options later. I did just push a new commit which makes it work with fsspec....
There is now a PR for the VideoDataSet: https://github.com/kedro-org/kedro/pull/1312
Hi @MerelTheisenQB, I definitelly intend to continue working on the commit - it just got shadowed by a lot of other work I needed to get done first. It will...