muffnn icon indicating copy to clipboard operation
muffnn copied to clipboard

move to datasets API

Open beckermr opened this issue 7 years ago • 12 comments

The latest TF versions recommend feeding data via the Datasets API (https://www.tensorflow.org/programmers_guide/datasets) and claim it is more efficient. We should probs do some testing and switch if it is warranted.

beckermr avatar Nov 05 '17 21:11 beckermr

I think we may need to change the place where self.input_targets_ is defined in order to use the dataset API

gdj0nes avatar Dec 03 '17 22:12 gdj0nes

Mmmmk. Make a PR and we can work on it.

beckermr avatar Dec 03 '17 23:12 beckermr

I finally did some benchmarking here. For smallish batches sizes, we will see ~20% percent performance improvements. At large batch sizes this API is actually slower from what I can tell.

beckermr avatar Dec 25 '17 01:12 beckermr

Where these just CPU benchmarks? I'd expect the largest gains to come from GPU. Large batch sizes were probably competiting for ram which with the model which wouldn't be the case if the model was on a GPU

On Sun, Dec 24, 2017 at 5:38 PM Matthew R. Becker [email protected] wrote:

I finally did some benchmarking here. For smallish batches sizes, we will see ~20% percent performance improvements. At large batch sizes this API is actually slower from what I can tell.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/civisanalytics/muffnn/issues/65#issuecomment-353813179, or mute the thread https://github.com/notifications/unsubscribe-auth/AP6EKxJQJK1etclMGHYj5mLSsSMyxgHNks5tDvyYgaJpZM4QSjLn .

gdj0nes avatar Dec 25 '17 17:12 gdj0nes

Yep just a CPU.

I don’t follow your logic here. How does tensorflow deal with memory management and transfers from the cpu to the gpu using the datasets api?

Why would large batch sizes not compete for ram on a gpu? The data still has to be moved to the gpu before it can be used.

beckermr avatar Dec 25 '17 18:12 beckermr

I did some reading. The dataset API currently runs only on the CPU. Apparently people use it to help stage data onto the gpu efficiently while the gpu is executing other operations.

I looked into one version of staging before and found minimal gains. Hopefully this API will do that better but I don’t have GPU access to test it out right now.

beckermr avatar Dec 25 '17 18:12 beckermr

I imagine the dataset API is designed for GPU models since that is where you are most likely going to deal with a disk io bottleneck you'd want to solve with an optimized pipeline. Therefore I think the pipeline is probably taking space from the model graph on ram since the developers didn't expect the model to also be on the CPU. In the case where the model is on GPU the pipeline is free to stage as much data on ram as it can without impacting the model performance. I'd guess that pipeline helps stage CPU to GPU copies but it may also stage a single batch.

Did you try changing the number perfectched by pipeline? I'd also check the tensorboard to see how full the queue is. Often times I've found you need a large number of threads to effectively fill the queue.

On Mon, Dec 25, 2017 at 10:07 AM Matthew R. Becker [email protected] wrote:

Yep just a CPU.

I don’t follow your logic here. How does tensorflow deal with memory management and transfers from the cpu to the gpu using the datasets api?

Why would large batch sizes not compete for ram on a gpu? The data still has to be moved to the gpu before it can be used.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/civisanalytics/muffnn/issues/65#issuecomment-353883195, or mute the thread https://github.com/notifications/unsubscribe-auth/AP6EK5JCDpORIiwi5EiC4tWkXUyu08ekks5tD-RggaJpZM4QSjLn .

gdj0nes avatar Dec 25 '17 19:12 gdj0nes

I just set the prefetched amount to 10x the batch size. Usually unless the code is doing a lot of memory allocation, competition for memory is not a problem on a cpu. If it did run out of memory, it would write to swap and the code would slow down by orders of magnitude, which is not what I saw.

beckermr avatar Dec 25 '17 19:12 beckermr

I can check tensorboard at some point but TBH I am not sure it is worth it right now.

beckermr avatar Dec 25 '17 19:12 beckermr

Also to clarify, this test was for data already in memory as opposed to commming off of disk.

beckermr avatar Dec 25 '17 19:12 beckermr

There seems to be an update an tf 1.5, soon to be released. However there appears to be a work around in 1.4. I'll experiment sometime in the next week with implementation.

gdj0nes avatar Jan 26 '18 15:01 gdj0nes

Looks like tf 1.5 now supports sparse tensors in the dataset api. This could make implementing this easier.

beckermr avatar Jan 30 '18 15:01 beckermr