tvm
tvm copied to clipboard
The change allows to increase performance in multi-thread environment.
In this case data locality is improved and it may have positive effect to final inference in case of MT execution.
Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.
The picture contains perf improvements in case of DLRM model (the points cloud was build for system with 48 cores)
@jwfromm @tkonolige could you please review? the changes may increase memory usage but the same time it decrease data crashing inside of the cpu caches.
The word "Global" means that it's a global singleton. Changing semantic to thread local will confuse developers.
Moreover, this patch works only because you initialise VirtualMachine modules form separate threads. That is not mandatory behaviour. Customer may create a set of VirtualMachine in main thread and after that assign them to worker sub threads.
This change is good enough to demonstrate possibility of advance memory management in case of multi instance execution. But it cannot be merged as is.