neural-fortran
neural-fortran copied to clipboard
Whither test_network_sync?
After attempting to develop test_network_sync.f90 into a test that reports passing or failure and seeking some advice Tom Clune, who develops the pFUnit testing framework, I'm not confident there's a way to write a test that is both meaningful straightforward. Without doing extensive research myself, I have the sense is that the formula employed in the randn
functions has been thoroughly studied and there's not much to test in its behavior beyond what is already known. @milancurcic if you have thoughts on a useful test, please let me know. Otherwise, I recommend removing the test, given that it currently just prints the generated samples without any form of check on the their statistical properties.
Also, the sync
type-bound procedure suffers from an unfortunate ambiguity in its name. I understand the reason for the name, but I'd just call it broadcast
or something similar because there's no explicit synchronization (sync all
or sync images
) and the standard imposes no precise synchronization semantics on collective subroutines other than noting that "the calculations performed by a collective subroutine have some internal synchronizations."
Yes, I wouldn't worry about this one as it will likely go away. I suggest focusing on io, datasets, and activations.
I looked at what sync
does to remind myself. It updates the weights and biases on all images > 1 to match those on image 1. A passing test for this would ensure they are indeed exactly the same after the call to sync
.
@milancurcic perfect. Thanks. I'll submit a PR with the suggested test.
@milancurcic It still might be that test_network_sync is unnecessary. Fortran 2018 provides an intrinsic subroutine: random_init(repeatable, image_distinct)
, where both arguments are logical, intent(in)
and where passing image_distinct=.true.
ensures that each image uses the same sequence of random numbers. At scale, some amount of redundant computation generally outperforms communication so the code will likely scale better by replacing co_broadcast
with a call to random_init
with image_distinct=.false.
.
The GNU, Intel, and Cray compilers support random_init
. Some important updates to the behavior of random_init
will appear in GCC 12, however, so I'll need to check whether GCC 11 works for our purposes. I just submitted a feature request to NAG, which usually response quickly so there's a reasonable chance that the NAG compiler will support random_int
soon too.
Actually yet another way to view this is that the test is useful either way and in fact, the test should give the same result either way. How that result is produced is an implementation detail that can be considered separately.
Yes, in other words, ensuring that the initial state of the network is valid across images.
I have determined that with gfortran
11.2.0 and the head of the main branch of OpenCoarrays, calling random_init
with image_distinct=.false.
produces the same random numbers on each image without need for co_broadcast
. See below. I haven't tested earlier versions of gfortran
or OpenCoarrays.
I recommend this approach for scalability reasons. I'll submit a PR with separate commits that add the test and replace the co_broadcast
with random_init
. I'm thinking of setting repeatable=.true.
so that test results is deterministic. The current test is non-deterministic so if you prefer that approach, let me know. I suppose we could also think about having a deterministic test and a non-deterministic test. I'm not sure it matters much either way given that our only goal is to ensure the same weights on all images.
% cat random.f90
program main
implicit none
real harvest(5)
call random_init(.true.,.false.)
call random_number(harvest)
print *, this_image(), ":", harvest
end program
% caf random.f90
% cafrun -n 8 ./a.out
5 : 0.571258783 0.226831257 0.175722122 0.691333890 0.885308683
6 : 0.571258783 0.226831257 0.175722122 0.691333890 0.885308683
7 : 0.571258783 0.226831257 0.175722122 0.691333890 0.885308683
8 : 0.571258783 0.226831257 0.175722122 0.691333890 0.885308683
1 : 0.571258783 0.226831257 0.175722122 0.691333890 0.885308683
2 : 0.571258783 0.226831257 0.175722122 0.691333890 0.885308683
3 : 0.571258783 0.226831257 0.175722122 0.691333890 0.885308683
Thanks @rouson. Any chance for this to work with GFortran 10?
Most likely, yes. Certainly the current approach should work with GFortran 10. If random_int
works with 10, I'll submit that approach as a separate PR.
Closing in favor of #78.