decision-forests
decision-forests copied to clipboard
Please support GPU
Background My tensorflow codes work on GPU. They have some matrix operations which can be done fast on GPU. If they run with tfdf, the data must be downloaded from GPU & uploaded to GPU when classification is done. In terms of throughput, this is a great loss.
Feature Request Please support GPU especially for inference like predict function. Training can take times because an user can try various configurations to find the best one. This is understandable. However, applying the trained model must meet the runtime requirement.
Hi there, Following along the above comment, I was just curious whether or not GPU support is being implemented in the near future?
Thank you!
hi @shayansadeghieh, while we would also very much love to have it, our high priority bucket list is still very full :( so from ourside we will not likely work on this in the near future. Accelerators (GPU, TPU, etc) is in our TODO list though.
While inference would be simpler to do, leveraging GPU/TPUs for training would be much harder. Notice DF algorithms doesn't do many floating point operations (other than calculating the scores at each level of the tree). Inference could be accelerated more easily though -- we did a draft in the past.
Maybe some interested developer would contribute it ?
Hi @janpfeifer Thank you for the quick response. No worries that it is not in your high priority list, I was just curious. Do you by any chance have a link to the draft you previously did for inference?
I don't have a link because we never open-sourced that more experimental code. Let me check here if it would be easy to make it visible.
Btw, notice the CPU implementation can be really fast, depending on the inference engine used:
- Using the C++ API (called "Yggdrasil") directly for inference can be much faster than the TensorFlow API, due to the overhead of the framework.
- For some trees Yggdrasil can make use of AVX2, which gets particularly fast.
- For many our use cases, examples with < 100 trees will run on ~ 1 or 2 microseconds (one 70 trees model ran on 700 nanoseconds). It's anecdotal information ... but just something to consider, depending on your needs.
- But GPU was still faster in some cases, in our experiments, but not by an order of magnitude. Again anecdotal, no guarantees on any specific model.
Hi everyone, just wanted to share some quick tangentially related info: While there is still no GPU-implementation for TF-DF, TF-DF models can now run on even faster FPGAs for really fast inference through the Conifer project. While this is still very much experimental, feel free to contact us if this is relevant for you.