Stop 1-Weso squaring on reaching target iterations
For 1-Wesolowski proofs (used for "blueboxing") the squaring thread previously needlessly continued squaring while proof computation was in progress. In practice, proof computation also takes a while, so this could easily result in 10-20M additional squaring iterations. With this change, the squaring stops after the iterations target is reached. This also makes CPU utilisation more predictable. Previously, during squaring 2 threads were heavily utilised, and 3 threads during proof computation. Now it's never more than 2 high-load threads: 2 for squaring, 1 for proof computation.
This was originally written more than 2 years ago for the "blueboxing group" we ran in a SpaceFarmers.io Discord channel and was used by the people participating in that group intensively over several months. With the recently renewed interest in blueboxing, I remembered that I never got around to upstreaming that change: so here we go :)
Awesome!
The macos TSAN (thread sanitizer) test does seem to be failing - this only runs on macOS intel so perhaps it's usefulness is declining. The Ubuntu TSAN test passed - so this is a curious failure
The macOS TSAN tests passes for me in main of chiavdf, but does not with this change. Although I can't quite figure out why yet since it seems to be complaining about std::cout
==================
WARNING: ThreadSanitizer: data race (pid=5847)
Read of size 4 at 0x7ff8569a1348 by main thread:
#0 std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long) <null>:2 (1weso_test:x86_64+0x10002bbe8)
#1 main <null>:2 (1weso_test:x86_64+0x10003e279)
#2 start <null>:2 (dyld:x86_64+0x552d)
#3 start <null>:2 (dyld:x86_64+0x552d)
Previous write of size 4 at 0x7ff8569a1348 by thread T1:
#0 std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long) <null>:2 (1weso_test:x86_64+0x10002bc45)
#1 repeated_square(unsigned long long, form, integer const&, integer const&, WesolowskiCallback*, FastStorage*, std::__1::atomic<bool>&) <null>:2 (1weso_test:x86_64+0x100039fa7)
#2 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(unsigned long long, form, integer const&, integer const&, WesolowskiCallback*, FastStorage*, std::__1::atomic<bool>&), unsigned long long, form, integer, integer, OneWesolowskiCallback*, FastStorage*, std::__1::reference_wrapper<std::__1::atomic<bool> > > >(void*) <null>:2 (1weso_test:x86_64+0x100040aa7)
As if synchronized via sleep:
#0 nanosleep <null>:2 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2ccc5)
#1 std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) <null>:2 (libc++.1.dylib:x86_64+0x15ad0)
#2 start <null>:2 (dyld:x86_64+0x552d)
#3 start <null>:2 (dyld:x86_64+0x552d)
Location is global 'std::__1::cout' at 0x7ff8569a12b0 (libc++.1.dylib+0x4179a348)
Thread T1 (tid=25547, finished) created by main thread at:
#0 pthread_create <null>:2 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x2df1f)
#1 main <null>:2 (1weso_test:x86_64+0x10003da3f)
#2 start <null>:2 (dyld:x86_64+0x552d)
#3 start <null>:2 (dyld:x86_64+0x552d)
SUMMARY: ThreadSanitizer: data race (1weso_test:x86_64+0x10002bbe8) in std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::__put_character_sequence<char, std::__1::char_traits<char> >(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned long)+0xb8
==================
Ok, if I comment out the line in vdf.h
std::cout << "VDF loop finished. Total iters: " << num_iterations << "\n" << std::flush;
The tests pass again.
This seems to be a TSAN bug in this version of Clang , as per the C++ spec:
In C++11, we do have some guarantees. The FDIS says the following in §27.4.1 [iostream.objects.overview]:
Concurrent access to a synchronized (§27.5.3.4) standard iostream object’s formatted and unformatted input (§27.7.2.1) and output (§27.7.3.1) functions or a standard C stream by multiple threads shall not result in a data race (§1.10). [ Note: Users must still synchronize concurrent use of these objects and streams by multiple threads if they wish to avoid interleaved characters. — end note ]
@xearl4 I added a lock around cout to prevent the TSAN false positive - it really shouldn't be needed, but 🤷 - too bad we don't have the convenient c++20 <syncstream> - but that would seem a larger change to require c++20
see #176
This needs #181 merged in first to fix TSAN
close and re-open for CI updates