muffnn
muffnn copied to clipboard
move to datasets API
The latest TF versions recommend feeding data via the Datasets API (https://www.tensorflow.org/programmers_guide/datasets) and claim it is more efficient. We should probs do some testing and switch if it is warranted.
I think we may need to change the place where self.input_targets_
is defined in order to use the dataset API
Mmmmk. Make a PR and we can work on it.
I finally did some benchmarking here. For smallish batches sizes, we will see ~20% percent performance improvements. At large batch sizes this API is actually slower from what I can tell.
Where these just CPU benchmarks? I'd expect the largest gains to come from GPU. Large batch sizes were probably competiting for ram which with the model which wouldn't be the case if the model was on a GPU
On Sun, Dec 24, 2017 at 5:38 PM Matthew R. Becker [email protected] wrote:
I finally did some benchmarking here. For smallish batches sizes, we will see ~20% percent performance improvements. At large batch sizes this API is actually slower from what I can tell.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/civisanalytics/muffnn/issues/65#issuecomment-353813179, or mute the thread https://github.com/notifications/unsubscribe-auth/AP6EKxJQJK1etclMGHYj5mLSsSMyxgHNks5tDvyYgaJpZM4QSjLn .
Yep just a CPU.
I don’t follow your logic here. How does tensorflow deal with memory management and transfers from the cpu to the gpu using the datasets api?
Why would large batch sizes not compete for ram on a gpu? The data still has to be moved to the gpu before it can be used.
I did some reading. The dataset API currently runs only on the CPU. Apparently people use it to help stage data onto the gpu efficiently while the gpu is executing other operations.
I looked into one version of staging before and found minimal gains. Hopefully this API will do that better but I don’t have GPU access to test it out right now.
I imagine the dataset API is designed for GPU models since that is where you are most likely going to deal with a disk io bottleneck you'd want to solve with an optimized pipeline. Therefore I think the pipeline is probably taking space from the model graph on ram since the developers didn't expect the model to also be on the CPU. In the case where the model is on GPU the pipeline is free to stage as much data on ram as it can without impacting the model performance. I'd guess that pipeline helps stage CPU to GPU copies but it may also stage a single batch.
Did you try changing the number perfectched by pipeline? I'd also check the tensorboard to see how full the queue is. Often times I've found you need a large number of threads to effectively fill the queue.
On Mon, Dec 25, 2017 at 10:07 AM Matthew R. Becker [email protected] wrote:
Yep just a CPU.
I don’t follow your logic here. How does tensorflow deal with memory management and transfers from the cpu to the gpu using the datasets api?
Why would large batch sizes not compete for ram on a gpu? The data still has to be moved to the gpu before it can be used.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/civisanalytics/muffnn/issues/65#issuecomment-353883195, or mute the thread https://github.com/notifications/unsubscribe-auth/AP6EK5JCDpORIiwi5EiC4tWkXUyu08ekks5tD-RggaJpZM4QSjLn .
I just set the prefetched amount to 10x the batch size. Usually unless the code is doing a lot of memory allocation, competition for memory is not a problem on a cpu. If it did run out of memory, it would write to swap and the code would slow down by orders of magnitude, which is not what I saw.
I can check tensorboard at some point but TBH I am not sure it is worth it right now.
Also to clarify, this test was for data already in memory as opposed to commming off of disk.
There seems to be an update an tf 1.5, soon to be released. However there appears to be a work around in 1.4. I'll experiment sometime in the next week with implementation.
Looks like tf 1.5 now supports sparse tensors in the dataset api. This could make implementing this easier.