tvm
tvm copied to clipboard
[OpenCL][Texture] Improved texture memory planning
Motivated form the fact that textures can be allocated over a clBuffer object and the size of backing clBuffer can be computed based on hardware image pitch alignment.
This optimizes the overall memory allocation on device and helps greately the models with large memory requirements.
Improvised the graph memory planner to not differentiate buffer and texture storage tokens and reuse them across. The texture pool in OpenCL runtime is rebranded as memory pool that handles allocation for both buffer and image objects.
NDArray to DeviceAPI interface is extended with AllocDataSpaceView and FreeDataSpaceView. These new API's acommodates accessing same physical memory as clBuffer / clImage objects.
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
- cc @echuraev, @elvin-n See #10317 for details
Generated by tvm-bot
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
- cc @echuraev, @elvin-n See #10317 for details
Generated by tvm-bot
Quickly took a look at this PR. It contains many changes, but no tests. I'd like to see unit tests for these changes.
I agree. I am working on it.
Appreciate a review on the interface changes (DeviceAPI, NDArray) and over all design aspects mean while.
@echuraev can you take a look now ?
Thanks @srkreddy1238 . Reading the set of changes, i feel that we need something that is more lifted out from the DeviceAPI since this is getting closer to the two stage concept.
- Allocator allocates physical Storage object (in this case backed by clBuffer)
- The storage object then allocates the NDArray(backed by the storage object).
We already have some of similar concept in memory manager, e.g. in relax, but of course the memory manager as it is does not yet reflect the need of supporting things like creating different memory scope. This would help to move some of the complexity in the texture part to a specialized allocator that creates the Storage with related behavior.
The main rationale is to keep the overall DeviceAPI minimum without behavior of things like pooling and move some of the logic to Storage, NDArray separation, that can hopefully also enable direct support in VM backed runtimes like unity. Maybe we can discuss a bit on how to make that part more generalized that can be reused.
Thanks @echuraev and @tqchen for a quick review.
I see MemoryManager
can meet the requirements of the pool concept we have here. I will improve the PR by
- Initializing the MemoryManager (PooledAllocator) from graph runtime (StorageSetup).
-
NDArray->Empty
=>PooledAllocator->Empty
(Assuming Pooled Allocator works for all runtimes or I cam limit this for texture scoped graphs). -
NDArray->CreateView
=>DeviceAPI->AllocDataSpaceView
: This will take care of various views over NDArray
Another situation I came across multiple is about runtime::GetDataSize(container->dl_tensor)
Can we move this to DeviceAPI ? In case of texture there is device specific attributes (image row pitch) that defines the underlaying memory size and accessing these attributes outside doesn't look to be a great idea.
@tqchen later I realized that the MemoryManager
interface is part of relax (unity branch, not mainline). Do I bring bring it into relay and tailor it ? Or Is there any other ideas ?
One thing that I can probably do is do do a bit of refactoring and bring it to runtime folder so we can leverage it in most places, I may need a few weeks to get to this. There is also already one in relay that can be used as getting started. GetDataSize is also a logic that can be specific to the pool.
@tqchen
I enhanced to reuse the existing VM runtime memory manager by graph executor at https://github.com/apache/tvm/pull/15058/commits/31d7de0cb6ee6dc0d1efcc8d53f2e570af7316f0
Basically,
- Promoted memory manager from
runtime/vm
toruntime
level - Redirected graph executor to use Memory Manager interface for StorageSetup.
- Corresponding changes in OpenCL Runtime to work with this MemoryManager.
Let me know your opinion on these.
Another recommendation I have is redirecting AllocWorkspace
and FreeWorkspace
to MemoryManager and drop workspace_pool
. workspace_pool
is slightly different from pooled_allocator
(workspace pool looks for best fit where as pooled allocator look for exact fit) and we can amend this functionality.
@tqchen @echuraev can you take another look on this ? I have rebased and improvised based on the new memory manager interface.
@tqchen & @yongwww can you take a look on this PR ?
@tqchen can you take a look on this PR ?