f90wrap
f90wrap copied to clipboard
Handle collision in array wrapper (that breaks CI intermittently)
I am encountering a sporadic handle collision for the python-side cache of f90wrapped fortran arrays.
This is because the pywrap keeps a cache of accessed arrays that are indexed by handle
, which is the memory location of the array. When arrays are deallocated and then re-allocated, especially on memory-constrained systems, there is a small but not insignificant chance of them being placed in the exact same location. This would happen sporadically in our CI runners resulting in very hard-to-diagnose failures, that were very hard to reproduce on our user systems.
The pywrap for an array resolves to:
@property
def <name>(self)
"""
Defined at \
<loc> \
line xxx
"""
array_ndim, array_type, array_shape, array_handle = \
_spec_f90wrapped.f90wrap_<mod_name>__array__<name>(f90wrap.runtime.empty_handle)
if array_handle in self._arrays:
<name> = self._arrays[array_handle]
else:
<name> = f90wrap.runtime.get_array(f90wrap.runtime.sizeof_fortran_t,
f90wrap.runtime.empty_handle,
_spec_f90wrapped.f90wrap_<mod_name>__array__<name>)
self._arrays[array_handle] = <name>
return <name>
The f90wrapped code for the array looks like
subroutine f90wrap_allglobal__array__allrzrz(dummy_this, nd, dtype, dshape, dloc)
use constants
use typedefns
use allglobal, only: allglobal_allrzrz => <NaMe>
use, intrinsic :: iso_c_binding, only : c_int
implicit none
integer, intent(in) :: dummy_this(2)
integer(c_int), intent(out) :: nd
integer(c_int), intent(out) :: dtype
integer(c_int), dimension(10), intent(out) :: dshape
integer*8, intent(out) :: dloc
nd = x
dtype = y
if (allocated(<array_name>)) then
dshape(1:x) = shape(<array_name>)
dloc = loc(<array_name>)
else
dloc = 0
end if
end subroutine f90wrap_<mod_name>__array__<array_name>
Could a different handle generator be used than the pointer, like a hash (though this would change if the data in the array changes)? Or could the cache be invalidated if an array is deallocated?
Though such collisions are unlikely, they are more likely to pop up in CI runners because they are: a) more memory constrained and b) they run similar tests many times, de-alllocating and allocating the same arrays over and over again.
What will hopefully solve this for us is manually clearing the modules' cache when instantiating a class that uses the f90wrapped code with my_mod._arrays = {}
, but the error we found seems to hint at a deeper issue
P.S. f90wrap is being used with much effect to wrap magneto hydrodynamic equilibrium codes and optimize stellarator fusion reactors, thanks for the great work!