racket icon indicating copy to clipboard operation
racket copied to clipboard

build from source segfaults on armv7hl

Open dbenoit17 opened this issue 8 years ago • 22 comments

raco setup segfaults while running <pkgs>/data-doc/data/scribblings/data.scrbl in a fedora koji armv7hl build environment.

raco setup: --- building documentation ---
raco setup: installing: /builddir/build/BUILDROOT/racket-6.9-1.fc26.arm/usr/share/doc/racket/scribble.css
raco setup: installing: /builddir/build/BUILDROOT/racket-6.9-1.fc26.arm/usr/share/doc/racket/scribble-style.css
raco setup: installing: /builddir/build/BUILDROOT/racket-6.9-1.fc26.arm/usr/share/doc/racket/racket.css
raco setup: installing: /builddir/build/BUILDROOT/racket-6.9-1.fc26.arm/usr/share/doc/racket/manual-style.css
raco setup: installing: /builddir/build/BUILDROOT/racket-6.9-1.fc26.arm/usr/share/doc/racket/manual-racket.css
raco setup: installing: /builddir/build/BUILDROOT/racket-6.9-1.fc26.arm/usr/share/doc/racket/manual-racket.js
raco setup: installing: /builddir/build/BUILDROOT/racket-6.9-1.fc26.arm/usr/share/doc/racket/manual-fonts.css
raco setup: installing: /builddir/build/BUILDROOT/racket-6.9-1.fc26.arm/usr/share/doc/racket/scribble-common.js
raco setup: installing: /builddir/build/BUILDROOT/racket-6.9-1.fc26.arm/usr/share/doc/racket/doc-site.css
raco setup: installing: /builddir/build/BUILDROOT/racket-6.9-1.fc26.arm/usr/share/doc/racket/doc-site.js
raco setup: running: <pkgs>/racket-doc/scribblings/reference/reference.scrbl
raco setup: running: <pkgs>/racket-doc/scribblings/guide/guide.scrbl
raco setup: running: <pkgs>/2d-doc/scribblings/2d.scrbl
raco setup: running: <pkgs>/racket-index/scribblings/main/acks.scrbl
raco setup: running: <pkgs>/algol60/algol60.scrbl
raco setup: running: <pkgs>/drracket/browser/browser.scrbl
raco setup: running: <pkgs>/drracket/help/bug-report.scrbl
raco setup: running: <pkgs>/games/cards/cards.scrbl
raco setup: running: <pkgs>/racket-doc/compatibility/scribblings/compatibility.scrbl
raco setup: running: <pkgs>/web-server-doc/web-server/scribblings/tutorial/continue.scrbl
raco setup: running: <pkgs>/contract-profile/scribblings/contract-profile.scrbl
raco setup: running: <pkgs>/net-cookies-doc/net/cookies/scribblings/cookies.scrbl
raco setup: running: <pkgs>/data-doc/data/scribblings/data.scrbl
SIGSEGV MAPERR si_code 1 fault on addr 0xa9ecfffc
make[1]: Leaving directory '/builddir/build/BUILD/racket-6.9/src'
make[1]: *** [Makefile:178: install-3m] Aborted (core dumped)

This is occurring as part of an rpm build in koji, so I do not have access to a physical armv7hl machine on which to run gdb/lldb. I can try to set up a virtual environment if that will help figure out the issue.

I can also confirm the issue is reproducible, and the segfault location is consistent across subsequent builds I have run.

Here is a link to the output of the build (failure is at the very end): https://kojipkgs.fedoraproject.org//work/tasks/6574/20446574/build.log

In case anyone is familiar with koji, and would find it helpful to compare the builds which succeeded on other architectures: https://koji.fedoraproject.org/koji/taskinfo?taskID=20446568

Please let me know if there is any more information I can provide which could be of assistance.

Thanks!

dbenoit17 avatar Jul 17 '17 19:07 dbenoit17

To clarify, this is a scratch build of the unix source release racket-6.9-src.tgz.

dbenoit17 avatar Jul 17 '17 19:07 dbenoit17

[I'm not sure that this is the problem but] How much memory does your computer have? Can you check if it runs out of memory while building the docs?

Building the docs uses a lot of memory. [IIRC, The builder has to remember all the internal links, or something.]

gus-massa avatar Aug 03 '17 16:08 gus-massa

@gus-massa After comparing the amount of memory allotted to the build environments of the passing vs. failing architectures, I think you might be correct. I have no way to test this because the build system memory allotments are fixed, so I am closing the issue.

dbenoit17 avatar Aug 03 '17 19:08 dbenoit17

@dbenoit17 Are you running this on a rpi? or a proper armv7 machine? I can compile this on an rpi which is armv7l (https://gitlab.com/LinkiTools/racket/-/jobs/178198271). So I am assuming that if the only different is the hard-float stuff than that might be the issue.

pmatos avatar Mar 20 '19 19:03 pmatos

Hey! I reopened this because I would like to enable support for Racket on 32-bit arm in Fedora. When trying to build Racket 7.2 I get the same issue as before, in the same place. We suspect the issue is caused by the combination of the docs taking lots of memory to build and lower per-process memory limits (ulimit) on the 32-bit systems enforced by the build infrastructure.

My understanding is that Debian gets around it by building docs only on x86_64 systems, and distributing those docs to other architectures (@bremner, did I get that right?). The Fedora build infrastructure does not allow subpackages to be conditionally excluded per-arch, and also performs a checksum verification step on non architecture-specific subpackages to ensure they are exactly the same. Unfortunately, this means I am unable to implement that sort of workaround.

@bremner recommended limiting the number of concurrent jobs using PLT_SETUP_OPTIONS="-j <job number>". I haven't had any luck with that yet, but I will try reducing concurrency even further.

We are using vms on real arm hardware: https://kojipkgs.fedoraproject.org//work/tasks/3637/33653637/hw_info.log

dbenoit17 avatar Mar 20 '19 21:03 dbenoit17

I have had the same problem so far building racket on armv7hl for Mageia and we are already using PLT_SETUP_OPTIONS="-j 1". Our armv7hl builders are Scaleway C1 machines which come with 2GB ram and we add 4GB swap.

raco setup: running: /data-doc/data/scribblings/data.scrbl make[1]: *** [Makefile:177: install-3m] Aborted (core dumped) make[1]: Leaving directory '/home/iurt/co/svn/racket/BUILD/racket-7.1/src'

The problem is always on data-doc/data/scribblings/data.scrbl (like previous report here)

pterjan avatar Apr 07 '19 00:04 pterjan

I am actively working on getting cross-arch build problems fixed. Stay tuned.

pmatos avatar Apr 08 '19 06:04 pmatos

As I mentioned in #2018, @dbenoit17 can you please test with --disable-generations configure flag?

pmatos avatar May 22 '19 21:05 pmatos

Definitely!

dbenoit17 avatar May 22 '19 21:05 dbenoit17

I'm still getting consistent failures on 32 bit arm, but in a different place once the patch is applied. I've hit segfaults due to low memory before on the arm builders, so my first though was it may have had to do with the larger GC size factor. I tried reducing the ratio down to .5 and then further to .25, but am still seeing the same issue so I think we can rule that out.

It's failing while trying to build the math flonum library:

raco setup:  in <pkgs>/profile-lib
raco setup:  in <pkgs>/math-lib/math
raco setup:  in <pkgs>/typed-racket-lib/typed/racket
raco setup:  in <pkgs>/typed-racket-lib/typed-racket
raco setup:  in <pkgs>/typed-racket-lib/typed-racket/minimal/lang
raco setup:  in <pkgs>/typed-racket-lib/typed-racket/env
raco setup:  in <pkgs>/typed-racket-lib/typed-racket/utils
raco setup:  in <pkgs>/typed-racket-lib/typed-racket/typecheck
raco setup:  in <pkgs>/typed-racket-lib/typed-racket/base-env
raco setup:  in <pkgs>/typed-racket-lib/typed-racket/private
raco setup:  in <pkgs>/typed-racket-lib/typed-racket/rep
raco setup:  in <pkgs>/typed-racket-lib/typed-racket/types
raco setup:  in <pkgs>/typed-racket-lib/typed-racket/logic
raco setup:  in <pkgs>/typed-racket-lib/typed-racket/infer
raco setup:  in <pkgs>/typed-racket-lib/typed-racket/typecheck/tc-app
raco setup:  in <pkgs>/typed-racket-lib/typed-racket/static-contracts
raco setup:  in <pkgs>/typed-racket-lib/typed-racket/static-contracts/combinators
raco setup:  in <pkgs>/typed-racket-lib/typed/racket/base/lang
raco setup:  in <pkgs>/typed-racket-more/typed
raco setup:  in <pkgs>/math-lib/math/private/base
raco setup:  in <pkgs>/math-lib/math/private/flonum
raco setup:  in <pkgs>/math-lib/math/private
raco setup:  in <pkgs>/typed-racket-lib/typed-racket/optimizer
raco setup:  in <pkgs>/typed-racket-lib/typed
raco setup:  in <pkgs>/typed-racket-lib/typed/racket/lang
raco setup:  in <pkgs>/math-lib/math/private/number-theory
raco setup:  in <pkgs>/math-lib/math/private/functions
raco setup:  in <pkgs>/math-lib/math/private/polynomial
raco setup:  in <pkgs>/math-lib/math/private/bigfloat
raco setup:  in <pkgs>/r6rs-lib/rnrs/arithmetic
raco setup:  in <pkgs>/math-lib/math/private/flonum/expansion
make[1]: *** [Makefile:196: install-3m] Aborted (core dumped)

armv7hl build log - .75 GC size ratio

dbenoit17 avatar May 28 '19 13:05 dbenoit17

Does that system really have only 25MB of memory? That seems very unlikely to be enough.

samth avatar May 28 '19 13:05 samth

I agree with @samth . If you are showing memory info with free -b the problem is that 25Mbs is not enough. I can build racket natively on a armv7l (rpi3) with a free -b of:

              total        used        free      shared  buff/cache   available
Mem:      968204288    31518720   702951424    48816128   233734144   834314240
Swap:     104853504           0   104853504

pmatos avatar May 28 '19 14:05 pmatos

@samth @pmatos That looks like the issue I'm seeing then. That means I may not be able to test 32 bit arm much further in our build system. Shall we close this issue and merge armv7hl into #2018 until the gc generations fix?

dbenoit17 avatar May 28 '19 14:05 dbenoit17

I have a Mageia build in progress with the change from in #2018 but I started it only 1h ago so it will take some time before I can report if this was also fixed

pterjan avatar May 28 '19 14:05 pterjan

@dbenoit17 sounds like a plan to me. @pterjan I will wait until you confirm it builds fine with --disable-generations before following up on @dbenoit17 plan.

pmatos avatar May 28 '19 14:05 pmatos

The armv7hl build got cancelled because i586 build failed with a similar error:

make[1]: *** [Makefile:196: install-3m] Aborted (core dumped)

http://pkgsubmit.mageia.org/uploads/failure/cauldron/core/updates_testing/20190528133351.pterjan.duvel.9181/log/racket-7.2-1.mga7/build.0.20190528133501.log

pterjan avatar May 28 '19 16:05 pterjan

I've hit the Inlining expected for #<procedure:extflvector-length> issue before, and I don't think it was on arm. That may be another issue altogether. I think I recall being unable to reproduce it upon a rebuild.

dbenoit17 avatar May 28 '19 17:05 dbenoit17

The "Inlining expected" error should be fixed by 80f84f21322.

mflatt avatar May 28 '19 17:05 mflatt

Thank you, I had missed 7.3 and was building 7.2 + the patches, trying again with 7.3 + patches

pterjan avatar May 28 '19 18:05 pterjan

I am getting a failure now at the same place as https://github.com/racket/racket/issues/1749#issuecomment-496527384

http://pkgsubmit.mageia.org/uploads/failure/cauldron/core/updates_testing/20190528180637.pterjan.duvel.3523/log/racket-7.3-1.mga7/build.0.20190528180713.log

raco setup:  in <pkgs>/r6rs-lib/rnrs/arithmetic
raco setup:  in <pkgs>/math-lib/math/private/flonum/expansion
make[1]: *** [Makefile:196: install-3m] Aborted (core dumped)
make[1]: Leaving directory '/home/iurt/rpmbuild/BUILD/racket-7.3/src'
make: *** [Makefile:119: install] Error 2

The machine has 2G ram and 6G swap.

$ free
              total        used        free      shared  buff/cache   available
Mem:        2068832       54572     1328872          96      685388     1951520
Swap:       6291452       21948     6269504

pterjan avatar May 28 '19 20:05 pterjan

I also just double-checked, and our builders have 25GiB memory, not MiB. The units in the hw-info.logs are in KiB. I'm also not seeing anything in ulimit that would limit memory used by the build. Maybe we should keep the issue open for now.

dbenoit17 avatar May 28 '19 21:05 dbenoit17

@dbenoit17 ping - what's the status of this?

pmatos avatar Dec 08 '20 09:12 pmatos