ITensors.jl icon indicating copy to clipboard operation
ITensors.jl copied to clipboard

[NDTensors] cuTENSOR extension

Open kmp5VT opened this issue 10 months ago • 21 comments

Description

The goal of this PR is to create a cuTENSOR extension to the NDTensors library. In this extension, I will make a function which converts (Dense) Tensors from NDTensors into cuTENSORs and I will create an overload of the NDTensors contract function that calls the cuTENSOR backend contract. As a reach goal, in the first effort for the BlockSparse code I will convert the BS tensors into dense tensors, call contract, construct the output blocksparse tensor and transfer only the non-zero blocks back into a blocksparse tensor. This functionality will be later more robustly solved using efforts from the NVIDIA team.

Checklist:

  • [x] It is possible to convert a NDTensors.Tensor into a cuTENSOR
  • [x] It is possible to call cuTENSOR based contraction code
  • [x] The result from cutensor contract is equivalent to the NDTensors contract
  • [x] Create unittests for cuTENSOR extension.

kmp5VT avatar Apr 18 '24 19:04 kmp5VT

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 60.28%. Comparing base (ceb26a7) to head (dfc4e7e). Report is 15 commits behind head on main.

:exclamation: Current head dfc4e7e differs from pull request most recent head 3636813. Consider uploading reports for the commit 3636813 to get more accurate results

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1395       +/-   ##
===========================================
+ Coverage   49.23%   60.28%   +11.05%     
===========================================
  Files         110      148       +38     
  Lines        8320     9757     +1437     
===========================================
+ Hits         4096     5882     +1786     
+ Misses       4224     3875      -349     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Apr 18 '24 20:04 codecov-commenter

Looks like a good start, nice to see it is pretty simple.

mtfishman avatar Apr 23 '24 17:04 mtfishman

@mtfishman yeah its surprisingly simple! I have started adding cutensor to the NDTensors test suite. So far I have found that the code works for Dense and Blocksparse Tensors and ITensors. However, it fails for Diag because there is no array function defined for that storagetype. Will work through debugging

kmp5VT avatar Apr 24 '24 14:04 kmp5VT

There are a few errors in the ITensor/MPS testing because of unsupported mixed type contractions in cutensor. I have opened a bug report in CUDA.jl here

kmp5VT avatar Apr 25 '24 17:04 kmp5VT

Thanks, seems like we can promote the tensors to a common type ourselves to circumvent that.

mtfishman avatar Apr 25 '24 17:04 mtfishman

@kmp5VT I think we should only send tensors with Dense storage wrapping CuArray data to the cuTENSOR backend, for example:

function NDTensors.contract(
  Etensor1::Exposed{<:CuArray,<:DenseTensors},
  labelstensor1,
  Etensor2::Exposed{<:CuArray,<:DenseTensor},
  labelstensor2,
  labelsoutput_tensor,
)
  # ...
end

mtfishman avatar Apr 27 '24 17:04 mtfishman

For some more context, here is where the dense blocks are contracted when two tensors with BlockSparse storage are contracted: https://github.com/ITensor/ITensors.jl/blob/v0.4.0/NDTensors/src/blocksparse/contract_generic.jl#L141-L150. R[blockR], tensor1[blocktensor1], and tensor2[blocktensor2] are blocks of the block sparse tensor, which are tensors with Dense storage.

So ideally when block sparse contraction occurs, if the cuTENSOR backend is enabled those dense block contractions will use dense contraction code defined in this new cuTENSOR extension. For that to happen, I think we should overload this contract signature:

function NDTensors.contract!(
  exposed_tensor_dest::Exposed{<:CuArray,<:DenseTensor},
  tensor_dest_labels,
  exposed_tensor1::Exposed{<:CuArray,<:DenseTensor},
  tensor1_labels,
  exposed_tensor2::Exposed{<:CuArray,<:DenseTensor},
  tensor2_labels,
  α::Number,
  β::Number,
)
  # Forward contraction to `cuTENSOR`
end

in the package extension as opposed to the out-of-place NDTensors.contract function.

mtfishman avatar Apr 28 '24 16:04 mtfishman

An issue with adding cuTENSOR to ITensors that I just realized again is that cuTENSOR has a compat restriction on TensorOperations to version 0.7.1. Currently TensorOperations is on version 4.1.1 . This compat restriction causes some of our tests to fail in the TensorAlgebra module. We could do what we are doing with Metal and AMDGPU and do Pkg.add("cuTENSOR") when we specifically test that package and have a flag in TensorAlgebra module to disable the tests when cuTENSOR \in ARGS

kmp5VT avatar Apr 29 '24 14:04 kmp5VT

You say "cuTENSOR has a compat restriction on TensorOperations" but cuTENSOR is a lower level library that likely doesn't know anything about TensorOperations, maybe you mean the other way around?

mtfishman avatar Apr 29 '24 14:04 mtfishman

In the TensorOperations.jl Project.toml: https://github.com/Jutho/TensorOperations.jl/blob/master/Project.toml I see they have a cuTENSOR extension and the [compat] entry is set to cuTENSOR = "1", while the latest cuTENSOR.jl version is v2.1.0 (https://github.com/JuliaGPU/CUDA.jl/blob/master/lib/cutensor/Project.toml). Maybe that is the issue you are seeing?

mtfishman avatar Apr 29 '24 14:04 mtfishman

I see there is an open PR about upgrading TensorOperations to cuTENSOR v2 here: https://github.com/Jutho/TensorOperations.jl/pull/160.

Also note that cuTENSOR.jl v2 only supports Julia 1.8 and onward (https://github.com/JuliaGPU/CUDA.jl/blob/v5.3.3/lib/cutensor/Project.toml#L20).

mtfishman avatar Apr 29 '24 14:04 mtfishman

I guess the latest CUDA.jl version now requires Julia 1.8 and up anyway (https://github.com/JuliaGPU/CUDA.jl/blob/v5.3.3/Project.toml#L80) so we could have the same restriction for NDTensorsCUDAExt and NDTensorscuTENSORExt.

mtfishman avatar Apr 29 '24 14:04 mtfishman

I think the best course of action would be something like what you said with manually adding and removing packages as needed in the tests instead of putting them as dependencies in the test Project.toml.

What we could do is surround any code that relies on TensorOperations in the tests with Pkg.add("TensorOperations") ... Pkg.rm("TensorOperations"), and then surround any code that relies on cuTENSOR with Pkg.add("cuTENSOR") ... Pkg.rm("cuTENSOR"). Then we should be able to use the latest versions of TensorOperations and cuTENSOR in the appropriate parts of the tests. That's similar to your suggestion but would allow us to test TensorAlgebra even if cuTENSOR tests are requested. Hopefully that wouldn't be too complicated to set up.

mtfishman avatar Apr 29 '24 15:04 mtfishman

@mtfishman Sorry, I did misunderstand what was going on. Thank you for looking into that and sending me this information! If we are supporting CUDA and cuTENSOR only in julia version 1.8 and up, does that mean we should bump the compat of NDTensors and ITensors to julia = 1.8 ?

kmp5VT avatar Apr 29 '24 15:04 kmp5VT

@mtfishman Sorry, I did misunderstand what was going on. Thank you for looking into that and sending me this information! If we are supporting CUDA and cuTENSOR only in julia version 1.8 and up, does that mean we should bump the compat of NDTensors and ITensors to julia = 1.8 ?

No, I think that would be pretty extreme. I think the only way to do it would be to require a more recent version of CUDA that requires Julia 1.8, which would then implicitly only allow users to use the latest NDTensorsCUDAExt with Julia 1.8. I think that isn't necessary, however, since NDTensorsCUDAExt appears to work just fine in older versions of Julia, I guess those tests are automatically using an older version of CUDA.jl which we are compatible with anyway (since we only use pretty high level features of CUDA.jl). So for NDTensorsCUDAExt I don't think we need to do anything right now.

For cuTENSOR/NDTensorscuTENSORExt I assume you will have to write the package extension with a certain cuTENSOR.jl version in mind, i.e. write it for cuTENSOR v2, in which case we should put a compat entry for cuTENSOR of 2 in the NDTensors Project.toml. That will implicitly only allow users to use NDTensorscuTENSORExt with Julia 1.8 and above.

mtfishman avatar Apr 29 '24 15:04 mtfishman

Right now I have a rudementary Pkg.add and Pkg.rm in the TensorAlgebra test file but it throws some extension compile errors which might be an issue so there might be a better way. I haven't done much yet to enforce the Julia 1.8 version other than add a compat value to the NDTensors project. I have updated the code to launch CuTensor from the contract! kernel and have updated the kernels across the library to be able to use CuTensor. Right now for BlockSparse, dense blockwise contractions can cal cuTENSOR based contract however in the DMRG testing I am seeing an internal cuTENSOR error

ERROR: CUTENSORError: an invalid value was used as an argument (code 7, CUTENSOR_STATUS_INVALID_VALUE) 

That I am working to remove. Dense contractions are working properly and using cuTENSOR

kmp5VT avatar Apr 29 '24 19:04 kmp5VT

Sounds good, seems like there are a few wrinkles to work out but mostly coming along.

mtfishman avatar Apr 29 '24 19:04 mtfishman

I see there is an open PR about upgrading TensorOperations to cuTENSOR v2 here: Jutho/TensorOperations.jl#160.

Also note that cuTENSOR.jl v2 only supports Julia 1.8 and onward (https://github.com/JuliaGPU/CUDA.jl/blob/v5.3.3/lib/cutensor/Project.toml#L20).

Sorry if I repeat things you already know, but let me just give you the state of affairs for that PR here: we have a package extension that works for cuTENSOR v1 in the current version of TensorOperations, but cuTENSOR v2 is actually very breaking, as it renames lots of the functions. I started doing some work on getting the update going, but it is a bit more cumbersome as TensorOperations handles generic StridedView objects from Strided.jl. In principle this is now finished, but it requires a large amount of code duplication from cuTENSOR itself, which specializes it's methods on DenseCuArray, while in principle this restriction can be loosened. I got in touch with @maleadt to maybe reorganize a bit of cuTENSORs functions, (see https://github.com/JuliaGPU/CUDA.jl/pull/2356), and once that is finalized TensorOperations should be updated soon after.

I have also struggled a bit with the restriction of cuTENSOR v2 with julia v1.8, and I think we more or less decided to also drop the support for julia v1.6-1.7 in the new TensorOperations versions, although this is not set in stone yet.

lkdvos avatar Apr 30 '24 06:04 lkdvos

Thanks for the context @lkdvos. It makes sense to start by improving the cuTENSOR wrapper code before going through a big refactor of TensorOperations. This isn't causing serious issues for us, only a few tests of ours rely on TensorOperations.

mtfishman avatar Apr 30 '24 22:04 mtfishman

Thanks @kmp5VT, this is a great step to get to for our GPU backends!

Can you update the table entry in https://github.com/ITensor/ITensors.jl/blob/v0.5.0/docs/src/RunningOnGPUs.md#gpu-backends?

mtfishman avatar May 03 '24 14:05 mtfishman

@mtfishman I added a block sparse (cuTENSOR) row in the table and put In progress there since I am working on that view issue that I found before still and will be making an issue with cutensor when I can isolate the problem.

kmp5VT avatar May 03 '24 14:05 kmp5VT