gcc-darwin-arm64 icon indicating copy to clipboard operation
gcc-darwin-arm64 copied to clipboard

gfortran and libSystemB.dylib issue when a large array is in play

Open fvaccari opened this issue 2 years ago • 15 comments

I've installed gcc-mp-devel with MacPorts

GNU Fortran (MacPorts gcc-devel 12-20220320_0+enable_stdlib_flag) 12.0.1 20220319 (experimental)

on a M1 Mac 64 GB RAM running macOS 12.2.1. Could compile and execute all the FORTRAN programs I'm working with but one, that has some large array definition.

Actually I've reproduced the issue with a very simple program:

program prog

complex  ::   w40(300000000)
w40=cmplx(0.0e0,0.0e0)

print *,"OK"
stop
end

I can compile/run this on an Intel Mac with 16 GB RAM (running Mojave, and an old gfortran (v. 7.5, but I succeeded with several versions from 4.x to 10.x on Intel Macs with OS ranging from 10.6.8 to 10.14.6), and I've no issues also an a Linux VM (CentOS 6 with gfortran 4.4.7 and just 8 GB of RAM assigned).

What I get on the arm64 Mac is:

> gfortran-mp-devel prog.f90
> ./a.out
dyld[88306]: dyld cache '/System/Library/dyld/dyld_shared_cache_arm64e' not loaded: syscall to map cache into shared region failed
dyld[88306]: Library not loaded: /usr/lib/libSystem.B.dylib
  Referenced from: /Volumes/xHD/ndsha/Shared/Test/Rpath/a.out
  Reason: tried: '/usr/lib/libSystem.B.dylib' (no such file), '/usr/local/lib/libSystem.B.dylib' (no such file)
[1]    88306 abort      ./a.out

The original program (full version) that fails to execute on my new arm64 Mac fails on the Linux VM 8 GB RAM Linux VM as well, but can be compiled/executed there provided that I add the option

-mcmodel=large

at compile time. Unfortunately that option leads to errors at compile time on any Mac (Intel or Apple Silicon) I've tried it on.

> gfortran-mp-devel -mcmodel=large prog.f90                                                                                         
/var/folders/72/b_3786ns6ld1z4n6k6cfsx7w0000gn/T//ccD4P6At.s:20:15: error: invalid variant 'BLEAH'
        adrp    x0, lC0@BLEAH
                        ^
/var/folders/72/b_3786ns6ld1z4n6k6cfsx7w0000gn/T//ccD4P6At.s:30:15: error: invalid variant 'BLEAH'
        adrp    x0, lC2@BLEAH
                        ^
/var/folders/72/b_3786ns6ld1z4n6k6cfsx7w0000gn/T//ccD4P6At.s:44:15: error: invalid variant 'BLEAH'
        adrp    x0, lC4@BLEAH
                        ^
/var/folders/72/b_3786ns6ld1z4n6k6cfsx7w0000gn/T//ccD4P6At.s:77:15: error: invalid variant 'BLEAH'
        adrp    x0, lC5@BLEAH
                        ^

I made some more (blind...) experiments on the arm64 Mac, and described the outcome here

(https://trac.macports.org/ticket/64896#comment:5)

Not sure if they could be of any help, but thanks for your very appreciated efforts on this whole project!

Franco

fvaccari avatar Mar 30 '22 13:03 fvaccari

did you as yet figure out the critical size for your array that leads to link failure (just drop a few zeros until it works)?

kencu avatar Mar 30 '22 14:03 kencu

No, I'll do some tests...

fvaccari avatar Mar 30 '22 14:03 fvaccari

Ok, I've found a program ready for use among those prepared by one of my former colleagues:

program max_size_array_double
implicit none

integer(kind=8) :: n,m
real(kind=8), allocatable, dimension(:) :: vec

n=1
do

  m=2**n
  allocate(vec(m))
  deallocate(vec)
  print*,'m=', m
  n=n+1

end do
end program

The execution outcome is this:

> ./max_size_array_double 
 m=                    2
 m=                    4
 m=                    8
 m=                   16
 m=                   32
 m=                   64
 m=                  128
 m=                  256
 m=                  512
 m=                 1024
 m=                 2048
 m=                 4096
 m=                 8192
 m=                16384
 m=                32768
 m=                65536
 m=               131072
 m=               262144
 m=               524288
 m=              1048576
 m=              2097152
 m=              4194304
 m=              8388608
 m=             16777216
 m=             33554432
 m=             67108864
 m=            134217728
 m=            268435456
 m=            536870912
 m=           1073741824
 m=           2147483648
 m=           4294967296
 m=           8589934592
 m=          17179869184
 m=          34359738368
 m=          68719476736
 m=         137438953472
 m=         274877906944
 m=         549755813888
 m=        1099511627776
 m=        2199023255552
 m=        4398046511104
 m=        8796093022208
In file 'max_size_array_double.f90', around line 11: Error allocating 140737488355328 bytes: Cannot allocate memory

Error termination. Backtrace:
#0  0x1030c2cef
#1  0x1030c3783
#2  0x1030c39d3
#3  0x102d0fc4b
#4  0x102d0fd47

No error about libSystem.B.dylib, possibly because of the dynamic allocation. So I'll explore more with the static (and possibly use 'allocate' in the original troubled program...)

fvaccari avatar Mar 30 '22 14:03 fvaccari

Indeed, I think the static allocation of the large array is exactly the issue that leads to running out of link address room, but I’m too dumb to really know for sure and we await the guru.

kencu avatar Mar 30 '22 14:03 kencu

your colleague’s program is something completely different, I believe… you’re just running out of memory there.

kencu avatar Mar 30 '22 14:03 kencu

> gfortran-mp-devel -mcmodel=large prog.f90                                                                                         

At this time, we do not support -mcmodel=large (actually, not on x86_64 either) .. probably it would be kinder to emit an error message than the failed relocations....

... the right course of action (as you have determined later in this thread) is to use dynamic allocation - that should only be limited by the memory on your system.

We will look into implementing the large model at some point (but there are higher priorities for the main port correctness at present).

iains avatar Mar 30 '22 15:03 iains

did you as yet figure out the critical size for your array that leads to link failure (just drop a few zeros until it works)?

So, I've set up a shell script test.sh

#!/bin/bash
set -e
SIZE=1

for ((i=1;i<=40;i=i+1))

do 

 echo $i
 SIZE=$SIZE*2
 sed s/DIM/$SIZE/ origin > static_complex.f90
 gfortran-mp-devel static_complex.f90 -ostatic_complex
 ./static_complex

done

where origin is

program max_size_array_complex_static
implicit none

integer(kind=8) , parameter :: vec_size=DIM
complex(kind=16)  ::   vec(vec_size)

vec=cmplx(0.0e0,0.0e0)

print *,"OK",vec_size

end program

The outcome is:

./test.sh
1
 OK                    2
2
 OK                    4
3
 OK                    8
...
...
24
 OK             16777216
25
 OK             33554432
26
dyld[4691]: dyld cache '/System/Library/dyld/dyld_shared_cache_arm64e' not loaded: syscall to map cache into shared region failed
dyld[4691]: Library not loaded: /usr/lib/libSystem.B.dylib
  Referenced from: /Volumes/xHD/ndsha/Shared/Test/Rpath/static_complex
  Reason: tried: '/usr/lib/libSystem.B.dylib' (no such file), '/usr/local/lib/libSystem.B.dylib' (no such file)
./test.sh: line 16:  4691 Abort trap: 6           ./static_complex

The error does not occur at the allocate statement, but at

vec=cmplx(0.0e0,0.0e0) when the initialisation is made. If I comment that, the loop continues. Maybe obvious, but I'm learning while experimenting...

fvaccari avatar Mar 30 '22 15:03 fvaccari

OK, so now you know the maximum size that can be allocated in the static way, and that it will not be made any larger any time soon so you will need to rewrite it as dynamic allocation if you need larger!

All done!

Check back in three to five years and perhaps the static allocation will be made larger by then, if anyone cares about that by that point...

kencu avatar Mar 30 '22 15:03 kencu

program prog

complex  ::   w40(300000000)
w40=cmplx(0.0e0,0.0e0)

print *,"OK"
stop
end

What I get on the arm64 Mac is:

> gfortran-mp-devel prog.f90
> ./a.out
dyld[88306]: dyld cache '/System/Library/dyld/dyld_shared_cache_arm64e' not loaded: syscall to map cache into shared region failed
dyld[88306]: Library not loaded: /usr/lib/libSystem.B.dylib
  Referenced from: /Volumes/xHD/ndsha/Shared/Test/Rpath/a.out
  Reason: tried: '/usr/lib/libSystem.B.dylib' (no such file), '/usr/local/lib/libSystem.B.dylib' (no such file)
[1]    88306 abort      ./a.out

That's a bit unfortunate - one might have hoped that the linker would have complained about this (rather than a run-time crash) ... my guess is that because of the way in which the shared libraries cache is implement on aarch64, the large memory allocation is causing an overlap between regions (but that's based on reading this text here - nothing more).

I would imagine that as you reduce the size of the complex array you'd reach a point that it works .. .. again dynamic allocation is a short-term fix, I guess.

iains avatar Mar 30 '22 15:03 iains

for static allocation(s) - I would guess that the critical size is hard to determine for a general Fortran source - since the limitation could well be on the sum of the various static objects in the program (plus, quite probably the program code itself - all would need to fit into the address space reachable with the 'medium' (default) mcmodel.

.. which is a long-winded way of saying that if you determine that 2Gb is the largest array you can allocate statically, that does not mean you can have two of them :) .. you have to share the available space ...

iains avatar Mar 30 '22 15:03 iains

Understanding why the static allocation works on Intel with 16 GB RAM and not on arm64 with 64 GB is beyond my capabilities. I just understand that this porting to arm64 must be an incredibly huge task, and I'm so happy that you went already so far...

I'm just curious to know if finding the equivalent of /usr/lib/libSystem.B.dylib on arm64/Monterey would magically solve the issue...

Now I'll start looking inside the original program that presented the error and try to go the dynamic way...

fvaccari avatar Mar 30 '22 15:03 fvaccari

not finding /usr/lib/libSystem.B.dylib is not actually the error.

It doesn't load because it can't do the relocations because it's out of memory. But dyld never thought of that possibility, so it says it can't find the library instead, as that is the error the dyld programmer DID expect.

So it's the wrong error, for the wrong issue, for something completely different.

:>

kencu avatar Mar 30 '22 16:03 kencu

Understanding why the static allocation works on Intel with 16 GB RAM and not on arm64 with 64 GB is beyond my capabilities.

The way in which the system libraries is delivered is different on arm64 (iOS and macOS) from x86 (and even powerpc if one goes back that far) - the underlying issue as @kencu says is running out of usable heap (not running out of available RAM)

I'm just curious to know if finding the equivalent of /usr/lib/libSystem.B.dylib on arm64/Monterey would magically solve the issue...

Nope, I do not think it is actually even possible - libSystem is part of the system fixed cache.

Now I'll start looking inside the original program that presented the error and try to go the dynamic way...

sorry, that's the best course of action right now.

iains avatar Mar 30 '22 16:03 iains

Ok, I'll follow that course of action, and thanks at @iains and @kenku for the help. Much appreciated!

fvaccari avatar Mar 30 '22 16:03 fvaccari

Happy to report that after converting to dynamic allocation the largest arrays of the original program, execution proceeds smoothly till the end.

Tested on arm64 Mac (10.12.1, 64GB RAM), Intel Mac (10.6.8 96 GB RAM; 10.14.6 16 GB RAM), CentOS 6 VM hosted on Intel Mac (VirtualBox, 8GB RAM assigned).

Thanks to all who contributed!

fvaccari avatar Mar 31 '22 17:03 fvaccari

I am going to close this, but we now have issue #100 which is asking for support for the large code model; perhaps that would also be useful in this case.

iains avatar May 27 '23 10:05 iains