llvm icon indicating copy to clipboard operation
llvm copied to clipboard

[SYCL][Clang] Add support for device image compression

Open uditagarwal97 opened this issue 1 year ago • 5 comments

This PR adds support for device image compression for the old offloading model. I'll make another follow-up PR to extend support for the new offload model.

Design summary:

ZSTD (compression algo)   ----> sycl-compress (Interface)  ------> clang-offload-wrapper
                                               |
                                                ----------------> clang-linker-wrapper
                                               |
                                                ----------------> SYCL RT (For decompression)

This PR adds:

  1. ZSTD as a 3rd party dependency, used for (de)compression.
  2. A new top-level LLVM project called sycl-compress which acts as an interface between ZSTD and tools that requires (de)compression (clang-offload-wrapper, clang-linker-wrapper, SYCL RT library). sycl-compress also take cares of logging, error handling, and optimizations regarding ZSTD.

How to use To compress device images, add -fsycl-compress-dev-imgs CLI option to your clang invocation. Note that we compress device images only if the size of device images exceeds a threshold, which is 1024 bytes by default. You can change the threshold using -fsycl-compress-threshold=<int> CLI option. Moreover, by default, we use ZSTD level 10 for compression. ZSTD compression levels provides a tradeoff between (de)compression time and compression ratio, and the compression level can be changed using -fsycl-compress-level=<int> CLI option.

WIP:

  1. Add a design document + update SYCL/Driver doc for the new CLI options.
  2. Fix zstd build for OSX ~3. Optimizations: Use ZSTD dictionary for SpirV (de)compression for a better compression ratio.~ (Can be postponed)

uditagarwal97 avatar Aug 18 '24 17:08 uditagarwal97

Some initial performance stats:

zstd_spirv_smol-v_dataset

Dataset: https://github.com/aras-p/smol-v/tree/master/tests/spirv-dumps Dataset size: 275 SPIR-V files

Conclusion: Overall, for SPIR-V files < 50KB, the decompression time is below 0.1ms, compression time <0.15ms, and compression ratio is ~3 (compressed image is 1/3 the original size). For very small images (<512 bytes), I don't see much benefit of image compression.

Note:- Most of the SPIR-V files I have in the dataset are <50KB. I'm working on extending the performance evaluation to larger workloads. Also, the (de)compression performance will vary with the format of the file being compressed, so for AOT, where device images consists of target assembly, the performance stats might differ.

uditagarwal97 avatar Aug 21 '24 18:08 uditagarwal97

What happens with the PTX and AMDGPU targets? Are they covered by the "native" binary image format? Do we need additional formats?

jbrodman avatar Aug 21 '24 19:08 jbrodman

Also guessing this feature may not make sense when combined with the native cpu device, but need to think more about that.

jbrodman avatar Aug 21 '24 19:08 jbrodman

What happens with the PTX and AMDGPU targets? Are they covered by the "native" binary image format? Do we need additional formats?

I think they are covered by the "none" binary image format. This is because clang driver (in SYCL offload mode) never specifies the image format in call to clang-offload-wrapper. So, by default, the BinaryImageFormat is "none" and it is upto the SYCL runtime to determine the format (https://github.com/intel/llvm/blob/sycl/sycl/source/detail/device_binary_image.cpp#L170).

I tested my changes with PTX, and they seem to work fine, so, we'd likely not require additional formats.

uditagarwal97 avatar Aug 21 '24 23:08 uditagarwal97

Is there some value in submitting this upstream and make it generic? Thanks

asudarsa avatar Aug 28 '24 14:08 asudarsa

Similar to upstream LLVM, we expect user to have zstd-dev package installed on their machine - we won't be installing zstd from sources.

At the same time, we should not fail hard as CI does at the moment. If the library is not installed, we should just disable the compression.

bader avatar Sep 10 '24 17:09 bader

Ping @premanandrao @mdtoguchi @bso-intel

uditagarwal97 avatar Sep 16 '24 14:09 uditagarwal97

@bso-intel @intel/llvm-reviewers-runtime ping!

uditagarwal97 avatar Sep 24 '24 21:09 uditagarwal97

@bso-intel

When the crash occurs when compression failed, the error message that the SYCL end-user will receive may not be useful. For example, "Failed to create ZSTD_CCtx", ""Failed to set ZSTD_c_compressionLevel", etc. It would be much better user experience if they get a message like "Device image compression failed." + e.what().

In https://github.com/intel/llvm/pull/15124/commits/946a738d5b7730e5af1cdf56d4c0ceaa176991da, I've wrapped zstd::compress in try/catch to throw a more meaningful error message. Note that this will only work if DPC++ is built with LLVM_ENABLE_EH.

uditagarwal97 avatar Sep 25 '24 20:09 uditagarwal97

@intel/llvm-gatekeepers The PR is ready to be merged. All the downstream infrastructure, along with intel/llvm CI machines, is ready with zstd installed.

uditagarwal97 avatar Oct 22 '24 21:10 uditagarwal97

@uditagarwal97 Could you please take a look at post-commit failures: https://github.com/intel/llvm/actions/runs/11468699849/job/31915329901


Failed Tests (2): SYCL :: Compression/compression.cpp SYCL :: Compression/compression_multiple_tu.cpp

On AMD/HIP

againull avatar Oct 22 '24 22:10 againull

@uditagarwal97 Could you please take a look at post-commit failures: https://github.com/intel/llvm/actions/runs/11468699849/job/31915329901

Failed Tests (2): SYCL :: Compression/compression.cpp SYCL :: Compression/compression_multiple_tu.cpp

On AMD/HIP

PR to disable the failing tests on HIP: https://github.com/intel/llvm/pull/15830

uditagarwal97 avatar Oct 23 '24 14:10 uditagarwal97