Normalize CV-CUDA Backend
Summary
This PR adds the CV-CUDA backend kernel for the Normalize transform.
How to use
import cvcuda
import torchvision.transforms.v2.functional as F
cvc_tensor = cvcuda.Tensor((1, 224, 224, 3), cvcuda.Type.F32, cvcuda.TensorLayout.NHWC)
# Dispatches to F.normalize_cvcuda
normalized_tensor = F.normalize(cvc_tensor, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
Run unit tests
pytest test/test_transforms_v2.py::TestNormalizeCVCUDA
...
60 passed in 0.59s
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9279
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:white_check_mark: No Failures
As of commit aa62a8ad59770083ec1891005f68b758f7723d26 with merge base aa35ca1965bea39b9a0996d5d2d7f15d325e54d2 ():
:green_heart: Looks good so far! There are no failures yet. :green_heart:
This comment was automatically generated by Dr. CI and updates every 15 minutes.
Hi @justincdavis!
Thank you for your pull request and welcome to our community.
Action Required
In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.
Process
In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.
Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.
If you have received this in error or have any questions, please contact us at [email protected]. Thanks!
Following up from my comment in the _normalize_cvcuda function itself. CV-CUDA requires that the mean and scale tensors be on-device when we call cvcuda.normalize. This means that a host->device memcpy must occur twice for each normalize call when using CV-CUDA backend. We could attempt to reduce the impact of this by having a helper function which creates the tuple[cvcuda.Tensor, cvcuda.Tensor] from the mean/std parameters. Based on what I see in the codebase, this seems like it would be a new feature present in torchvision for a functional transform.
# CV-CUDA requires float32 tensors for the mean/std parameters # at small batchs, this is costly relative to normalize operation # if CV-CUDA is known to be a backend, could optimize this # For Normalize class: # by creating tensors at class initialization time # For functional API: # by storing cached tensors in helper function with functools.lru_cache (would it even be worth it?) # Since CV-CUDA is 1) not default backend, 2) only strictly faster at large batch size, ignore
Hey @justincdavis, looking good to me. I don't think the failing test is related to this PR. Seems like a false positive alert to me! Can you sign our Contributor License Agreement (c.f. meta-cla bot comment in the discussion)?