PackedArray icon indicating copy to clipboard operation
PackedArray copied to clipboard

0 Initalized packed array buffer

Open guinn8 opened this issue 3 years ago • 5 comments

I have 0 initialized the PackedArray buffer. I was valrgrind'ing around in my application and was getting Conditional jump or move depends on uninitialised value(s) when using a 1- bit packed array buffer and was getting incorrect results. This fix resolves the issue. I reran the tests and everything is passing. This is a fix for #6

Bug can be reproduced in this commit with this input aliquot/_pom_yang_rewrite/ make && valgrind ./main --bound=$((10**6)) --seg_len=$((5 * $((10**3)))) --writebuf_len=1000000 --preimage_count_bits=1 --num_threads=12

guinn8 avatar Jan 21 '22 19:01 guinn8

Hello @guinn8 👋

I likely won't accept this PR. At that point, it seems that all it does is working around a problem in your own code.

If you really want to 0 initialize the buffer, imho a proper way is to redefined PACKEDARRAY_MALLOC to use calloc() instead of malloc().

As far as Valgrind goes, I get no issue when running tests

$ make -C _gnu-make/
make: Entering directory '/data/Projects/PackedArray/_gnu-make'
mkdir -p /data/Projects/PackedArray/bin/linux-x86_64
cc -o /data/Projects/PackedArray/bin/linux-x86_64/PackedArraySelfTest -DPACKEDARRAY_SELF_TEST -std=c99 -pedantic -O2 -g /data/Projects/PackedArray/PackedArray.c
mkdir -p /data/Projects/PackedArray/bin/linux-x86_64
cc -o /data/Projects/PackedArray/bin/linux-x86_64/PackedArraySelfBench -DPACKEDARRAY_SELF_BENCH -DNDEBUG -std=c99 -pedantic -O2 -g /data/Projects/PackedArray/PackedArray.c
mkdir -p /data/Projects/PackedArray/bin/linux-x86_64
cc -o /data/Projects/PackedArray/bin/linux-x86_64/PackedArraySIMDSelfTest -DPACKEDARRAY_SELF_TEST -std=c99 -pedantic -O2 -g /data/Projects/PackedArray/PackedArraySIMD.c
mkdir -p /data/Projects/PackedArray/bin/linux-x86_64
cc -o /data/Projects/PackedArray/bin/linux-x86_64/PackedArraySIMDSelfBench -DPACKEDARRAY_SELF_BENCH -DNDEBUG -std=c99 -pedantic -O2 -g /data/Projects/PackedArray/PackedArraySIMD.c
make: Leaving directory '/data/Projects/PackedArray/_gnu-make'
$ valgrind ./bin/linux-x86_64/PackedArraySelfTest
==241616== Memcheck, a memory error detector
==241616== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==241616== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==241616== Command: ./bin/linux-x86_64/PackedArraySelfTest
==241616==
-- PackedArray self test -------------------------------------------------------

sizeof(PackedArray) = 16

1 by 1 packing / unpacking:
   1 bits per item -- success.
   2 bits per item -- success.
   3 bits per item -- success.
   4 bits per item -- success.
   5 bits per item -- success.
   6 bits per item -- success.
   7 bits per item -- success.
   8 bits per item -- success.
   9 bits per item -- success.
  10 bits per item -- success.
  11 bits per item -- success.
  12 bits per item -- success.
  13 bits per item -- success.
  14 bits per item -- success.
  15 bits per item -- success.
  16 bits per item -- success.
  17 bits per item -- success.
  18 bits per item -- success.
  19 bits per item -- success.
  20 bits per item -- success.
  21 bits per item -- success.
  22 bits per item -- success.
  23 bits per item -- success.
  24 bits per item -- success.
  25 bits per item -- success.
  26 bits per item -- success.
  27 bits per item -- success.
  28 bits per item -- success.
  29 bits per item -- success.
  30 bits per item -- success.
  31 bits per item -- success.
  32 bits per item -- success.

bulk packing / unpacking:
   1 bits per item -- success.
   2 bits per item -- success.
   3 bits per item -- success.
   4 bits per item -- success.
   5 bits per item -- success.
   6 bits per item -- success.
   7 bits per item -- success.
   8 bits per item -- success.
   9 bits per item -- success.
  10 bits per item -- success.
  11 bits per item -- success.
  12 bits per item -- success.
  13 bits per item -- success.
  14 bits per item -- success.
  15 bits per item -- success.
  16 bits per item -- success.
  17 bits per item -- success.
  18 bits per item -- success.
  19 bits per item -- success.
  20 bits per item -- success.
  21 bits per item -- success.
  22 bits per item -- success.
  23 bits per item -- success.
  24 bits per item -- success.
  25 bits per item -- success.
  26 bits per item -- success.
  27 bits per item -- success.
  28 bits per item -- success.
  29 bits per item -- success.
  30 bits per item -- success.
  31 bits per item -- success.
  32 bits per item -- success.
==241616==
==241616== HEAP SUMMARY:
==241616==     in use at exit: 0 bytes in 0 blocks
==241616==   total heap usage: 81,921 allocs, 81,921 frees, 73,788,928 bytes allocated
==241616==
==241616== All heap blocks were freed -- no leaks are possible
==241616==
==241616== For lists of detected and suppressed errors, rerun with: -s
==241616== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)  

gpakosz avatar Jan 21 '22 19:01 gpakosz

Fair enough to no accept the PR as is. Redefining PACKEDARRAY_MALLOC to use calloc is not entirely straight forward as you would need to handle the additional argument, that's why I went for memset. I suppose the issue is that I assumed PackedArray would be zero initialized as a complex data structure. Would you accept a PR that makes the readme and other docs very clear that memory is left uninitialized?

guinn8 avatar Jan 21 '22 21:01 guinn8

You are correct about it being a flaw in my application, my code assumed zero initialization. Looked like a bug from my perspective as the error was only observed on some inputs. Super cool library, reduced the memory consumption of my application by a factor of 8 (from 480gb to 60gb)!

guinn8 avatar Jan 21 '22 21:01 guinn8

Redefining PACKEDARRAY_MALLOC to use calloc is not entirely straight forward as you would need to handle the additional argument, that's why I went for memset

You would do

#define PACKEDARRAY_MALLOC(size) calloc(1, size)

gpakosz avatar Jan 21 '22 22:01 gpakosz

Also the benefit of calloc() is that can leverages the kernel filling pages with 0 depending on the OS. Which means it has the potentiel of being faster than malloc() then memset()

gpakosz avatar Jan 21 '22 22:01 gpakosz