zfp ZFP cuda produces 0 decompressed data

Dear ZFP developers,

I tested zfp with cuda option to compress and decompress a dataset of around 5G. But its decompressed dataset contains all 0. Here is the commands I used:

zfp  -i  inputdata.dat -z  output.comp -r 16  -x cuda -f -3 150 5850 1601
zfp  -z output.comp -o output.decomp -r 16  -x cuda -f -3 150 5850 1601

output.decomp contain all 0. It produced same result both on IBM Power8 (P100 GPU) and Dell X86_64 node (V100 GPU). However it will run correctly without "-x cuda" option (which means running on CPU)! Here are my environments:

compiler: gcc/7.3.0
CUDA:  10.1 
zfp version: 0.5.5

Is there anything missing in my case? Thanks in advance!

Best Shelton Ma

Aug 31 '20 20:08 sheltongeosx

Dear Shelton,

This is a rather large data set. The uncompressed data is 5.2 GB and the compressed data is another 2.6 GB. zfp should definitely report an error if there is not enough GPU memory, but I think there may be sections of the CUDA implementation that assume that the uncompressed data can be addressed using only 32 bits (4 GB) and that can cause silent errors. We will revisit the CUDA implementation in October to address any such issues.

For now, can you try compressing the data in two or more pieces and see if that works? The easiest would be to partition the data along z into slabs that are 800+801 or even 400+400+400+401 elements wide. You can perform such partitioning using the Unix dd command and pipe the output of dd to the input of zfp to avoid temporary files, e.g.,

dd if=inputdata.dat bs=3510000 count=401 skip=1200 | zfp -i - -z output.comp -r 16 -x cuda -f -3 150 5850 401

would compress the last 401 "layers" of elements.

Best, Peter

-- Peter Lindstrom . [email protected]mailto:[email protected] . http://people.llnl.gov/pl . 925-423-5925

From: Shelton Ma [email protected] Sent: Monday, August 31, 2020 1:06 PM To: LLNL/zfp [email protected] Cc: Subscribed [email protected] Subject: [LLNL/zfp] ZFP cuda produces 0 decompressed data (#105)

Dear ZFP developers,

I tested zfp with cuda option to compress and decompress a dataset of around 5G. But its decompressed dataset contains all 0. Here is the commands I used:

zfp -i inputdata.dat -z output.comp -r 16 -x cuda -f -3 150 5850 1601 zfp -z output.comp -o output.decomp -r 16 -x cuda -f -3 150 5850 1601

output.decomp contain all 0. It produced same result both on IBM Power8 (P100 GPU) and Dell X86_64 node (V100 GPU). However it will run correctly without "-x cuda" option (which means running on CPU)! Here are my environments:

compiler: gcc/7.3.0 CUDA: 10.1 zfp version: 0.5.5

Is there anything missing in my case? Thanks in advance!

Best Shelton Ma

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/LLNL/zfp/issues/105, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAJ6RV3HNVUCH5IUZHMZHEDSDP7EHANCNFSM4QQ4NKLA.

Aug 31 '20 21:08 lindstro

Dear Peter,

Thank you very much for your suggestions. I actually splitted the input data set into 4 parts and running with one of the pieces still gives the 0 decompressed volume:

zfp  -i  inputdata.dat -z  output.comp -r 16  -x cuda -f -3 38 5850 1601
zfp  -z output.comp -o output.decomp -r 16  -x cuda -f -3 38 5850 1601

The compressed and decompressed data sizes now are: 750 M and 1.4G respectively.

Best, Shelton

Sep 01 '20 20:09 sheltongeosx

Maybe a dumb question, but are you certain that the input data actually has nonzero values?

Note that zfp assumes that the leftmost index varies fastest (aka. Fortran order). To partition the data along x like you've done, you would have had to piece together noncontiguous chunks of data. Partitioning along z (as in the example I gave) would be far easier. And given your choice of partitioning, I suspect that you may have transposed the dimensions (see this discussion). Such accidental transposition can lead to a nearly random sequence of values that is difficult to compress. That shouldn't result in all-zeros, but could still lead to unusually large errors in the reconstructed field.

Before we speculate any further on what's causing this issue, may I suggest that you check out the develop branch and run the CUDA tests just to make sure that the CUDA implementation is working correctly on smaller data:

git clone https://github.com/LLNL/zfp.git
cd zfp
git checkout develop
mkdir build
cd build
cmake .. -DZFP_WITH_CUDA=ON -DBUILD_TESTING=ON
make
ctest

Sep 01 '20 21:09 lindstro

Dear Peter,

Thank you very much for mentioning about the order to specify the data dimensions. The following commands

 zfp  -i  inputdata.dat -z  output.comp -r 16  -x cuda -f -3 1061 5850 38
 zfp  -z output.comp -o output.decomp -r 16  -x cuda -f -3 1061 5850 38

now produce correct results. As you mentioned earlier, it could not handle my 5G data example.

Best Shelton

Sep 02 '20 20:09 sheltongeosx

I'm glad to hear this is working, though we need to look into what's causing the failure for the larger data set and why zfp is not reporting an error. I will keep this issue open until we've had time to take a closer look.

Sep 02 '20 23:09 lindstro

@sheltongeosx Sorry for taking so long to get back to you regarding this issue. We're finally at a point where we have time to go over the CUDA implementation to make sure it's bug free.

We fixed a related issue (#121) on the develop branch that might also address the one you reported. Would you mind rerunning your example (on the whole 1061x5850x38 volume) to see if it works now?

Feb 02 '21 15:02 lindstro

@sheltongeosx was this fixed for you? I've run some recent tests against our staging branch that seem to show this issue has been solved but it would be good to hear from your end if the issue remains or was indeed solved by the #121 fix.

Feb 09 '23 19:02 GarrettDMorrison

Going to close this for now, feel free to re-open if you are still seeing issues.

Jun 14 '23 22:06 GarrettDMorrison

zfp zfp copied to clipboard

ZFP cuda produces 0 decompressed data

zfp
zfp copied to clipboard