gap
gap copied to clipboard
gap startup error
Observed behaviour
[MacBook-Pro:~/SOFT.24] betten% gap
┌───────┐ GAP 4.12.2 of 2022-12-18
│ GAP │ https://www.gap-system.org
└───────┘ Architecture: aarch64-apple-darwin22-default64-kv8
Configuration: gmp 6.2.1, GASMAN
Loading the library and packages ...
Syntax warning: Unbound global variable in /Users/betten/SOFT.23/gap-4.12.2/lib/primality.gi:515
Np:=(N-1)/p;
^^
Expected behaviour
Copy and paste GAP banner (to tell us about your setup)
Thanks for the report @abetten, are you installing from the git repo or from the release archive? I seem to remember seeing something similar but don't remember what the issue was.
The funny thing is, this is an existing installation that was working before. I do not know if I changed a path or anything like that, but suddenly I get this error message. It would be nice to trace down the exact reason for it. I remember having seen it before, but I cannot remember what I did back then.
Release archive or from git?
This is really weird. Can you please tell us the output of
shasum -a 256 /Users/betten/SOFT.23/gap-4.12.2/lib/primality.gi
If the reported checksum differs from
2e598e50b5823f6c8b02d34b83e204e89f91bcb19521dc2e7d56d12205708c80
then maybe upload that file to this issue (you may have to add '.txt' as extension for that).
Just one quick check, is there a 'gap' in SOFT.24, and does it work if you just run GAP from SOFT.23? I notice you are picking up the 'gap' library from SOFT.23 in the SOFT.24 directory. Just wanted to check if there is some very strange path issue going on (even if there is, GAP shouldn't behave like this).
Can't reproduce and no further communication by issue submitter. So closing. If the issue reappears with GAP 4.13.0 (which will be released tomorrow), feel free to re-open.
Just to say that I am now seeing this in the CI jobs of the Semigroups package at, for example:
https://github.com/semigroups/Semigroups/actions/runs/9155759303/job/25168765002?pr=1012
Maybe there's something wrong in the setup of the CI, but this is still an unhelpful way to indicate that. @fingolfin @ChrisJefferson
No, now I can see the full output, yours looks at first glance to me like good old memory corruption -- you can see it get very upset at the end. Of course, I have no idea why it is happening here and not elsewhere (although the common fact is macs)...
I really don't want to start trying to do debugging via github action...
Just to be clear, the end of the log looks like:
Syntax warning: Unbound global variable in /Users/runner/gap/lib/primality.gi:\
527
return [true,a];
^
Error, Length: <list> must be a list (not the integer 1518809461113931223)�v�|
Which looks to me like some object corruption has occurred. This could be semigroup's fault hypothetically, but I'm tempted to say not, because it's currently parsing primality.gi, which is before we reach that point.
My best guess is that Apple has done something which is messing up how gasman marks bags, but it could be any weird compiler thing really.
Could someone with a up to date mac try building stable-4.12 with 'TREMBLE_HEAP' enabled (you could just go into gasman.c and remove the #ifdef TREMBLE_HEAP guards in the two places it appears around "CollectBags(0,0)", then try building GAP and running it?)
Note this is one of those "gosh GAP is going to take a long time to do anything, even start" type options, so don't do it on a machine where you don't mind the fan spinning up for quite a long time (could be hours!)
Thanks @ChrisJefferson I'll try what you suggested just now.
I just tried what you suggested @ChrisJefferson using the release archive of GAP 4.12.2, and this doesn't seem to reproduce the error:
❯ ./gap -A
┌───────┐ GAP 4.12.2 of 2022-12-18
│ GAP │ https://www.gap-system.org
└───────┘ Architecture: aarch64-apple-darwin22-default64-kv8
Configuration: gmp 6.2.1, GASMAN
Loading the library and packages ...
Packages: GAPDoc 1.6.6, PrimGrp 3.4.4, SmallGrp 1.5.3, TransGrp 3.6.5
Try '??help' for help. See also '?copyright', '?cite' and '?authors'
gap>
Here's the config.log file:
Not sure that my mac counts as "up to date", unfortunately, it's an M1 from 2021 IIRC.
Thanks. I'm going to try poking a bit on my PC and see if I can shake anything out.
Just to mention that this doesn't seem to occur with GAP 4.13:
https://github.com/semigroups/Semigroups/actions/runs/9187837589/job/25266363188?pr=1012
So it is perhaps resolved already.
I managed, by sshing into github actions on stable-4.12 on the semigroups CI to catch this error.
After a lot of debugging, I have tracked the problem down, I think, to gmp.
_gmpz_mul seems to be writing to a memory location it shouldn't. The memory it writes to isn't allocated yet so doesn't cause a problem in most cases, but the string allocation code assumes the memory it uses will be zeroed, so when writing a string doesn't bother with a null terminator, so we end up with local variables with silly names like pfdjifdsjio (instead of p), which is what causes the "unknown global" message.
The actual error occurs here:
frame #0: 0x0000000101f0ebe8 libgmp.10.dylib`__gmpn_mul_1c + 200
* frame #1: 0x0000000101f05228 libgmp.10.dylib`__gmpz_mul + 160
frame #2: 0x000000010085a9c0 gap`ProdInt(opL=0x00001000003c8130, opR=0x0000000017d78401) at integer.c:1471:3 [opt]
frame #3: 0x000000010085a54c gap`IntStringInternal(string=0x0000000000000000, str="84128410784489288223092474348389603623030322640088442936747974518239642507631380108010588884252565717918682347709584444173260730941561211749732512257059040264927466644819174048875651367892940295977531020921450283370778464844131921016112826112511277611411962047115457979770639907893271757547513348734936139234492934084356041841547537781640044258066541550710400764797315999285813") at integer.c:1087:19 [opt]
frame #4: 0x0000000100867bfc gap`IntrIntExpr(intr=0x000000016f5cad80, string=0x0000000000000000, str="84128410784489288223092474348389603623030322640088442936747974518239642507631380108010588884252565717918682347709584444173260730941561211749732512257059040264927466644819174048875651367892940295977531020921450283370778464844131921016112826112511277611411962047115457979770639907893271757547513348734936139234492934084356041841547537781640044258066541550710400764797315999285813") at intrprtr.c:1794:15 [opt]
frame #5: 0x00000001008f553c gap`ReadLiteral(rs=0x000000016f5ca930, follow=18446744072694563073, mode='r') at read.c:1520:27 [opt]
That huge number occurs in primality.gi, which causes the memory corruption, which is why people see a bug a little later in primality.gi, if they see a bug.
The question (which I don't yet know the answer to) is is this a problem with linking the wrong libgmp (we seem to be linking to GAP's internal libgmp in this case), or is our 'fakegmp' somehow messed up? It's hard to tell what's going on inside libgmp due to a lack of debug symbols.
Some data dumping:
The memory location incorrectly written to is: 0x100007341fc0
The 3 mpzs passed to mpz_mul by ProdInt are:
(lldb) print mpzResult
(fake_mpz_t) {
[0] = {
v = {
[0] = {
_mp_alloc = 11
_mp_size = 0
_mp_d = 0x00001000072e3fe0
}
}
tmp = 6163310904
obj = 0x00001000003c8138
}
}
(lldb) print mpzL
(fake_mpz_t) {
[0] = {
v = {
[0] = {
_mp_alloc = 10
_mp_size = 10
_mp_d = 0x00001000072e3f78
}
}
tmp = 8412841078448928
obj = 0x00001000003c8130
}
}
(lldb) print mpzR
(fake_mpz_t) {
[0] = {
v = {
[0] = {
_mp_alloc = 1
_mp_size = 1
_mp_d = 0x000000016f5c9908
}
}
tmp = 100000000
obj = NULL
}
}
make check is segfaulting all over the place, so I wonder if it's just gnump 6.2.1 doesn't support macs properly, which is what is in 4.12 (we have 6.3 in the latest release).
A feature of 6.3 is "Support for 64-bit Arm under Macos. "
Well done, Chris!! Excellent work.
Best, Anton
On May 22, 2024, at 4:28 PM, Christopher Jefferson @.***> wrote:
** Caution: EXTERNAL Sender **
I managed, by sshing into github actions on stable-4.12 on the semigroups CI to catch this error.
After a lot of debugging, I have tracked the problem down, I think, to gmp.
_gmpz_mul seems to be writing to a memory location it shouldn't. The memory it writes to isn't allocated yet so doesn't cause a problem in most cases, but the string allocation code assumes the memory it uses will be zeroed, so when writing a string doesn't bother with a null terminator, so we end up with local variables with silly names like pfdjifdsjio (instead of p), which is what causes the "unknown global" message.
The actual error occurs here:
frame #0: 0x0000000101f0ebe8 libgmp.10.dylib`__gmpn_mul_1c + 200
- frame #1: 0x0000000101f05228 libgmp.10.dylib
__gmpz_mul + 160 frame #2: 0x000000010085a9c0 gapProdInt(opL=0x00001000003c8130, opR=0x0000000017d78401) at integer.c:1471:3 [opt] frame #3: 0x000000010085a54c gapIntStringInternal(string=0x0000000000000000, str="84128410784489288223092474348389603623030322640088442936747974518239642507631380108010588884252565717918682347709584444173260730941561211749732512257059040264927466644819174048875651367892940295977531020921450283370778464844131921016112826112511277611411962047115457979770639907893271757547513348734936139234492934084356041841547537781640044258066541550710400764797315999285813") at integer.c:1087:19 [opt] frame #4: 0x0000000100867bfc gapIntrIntExpr(intr=0x000000016f5cad80, string=0x0000000000000000, str="84128410784489288223092474348389603623030322640088442936747974518239642507631380108010588884252565717918682347709584444173260730941561211749732512257059040264927466644819174048875651367892940295977531020921450283370778464844131921016112826112511277611411962047115457979770639907893271757547513348734936139234492934084356041841547537781640044258066541550710400764797315999285813") at intrprtr.c:1794:15 [opt] frame #5: 0x00000001008f553c gap`ReadLiteral(rs=0x000000016f5ca930, follow=18446744072694563073, mode='r') at read.c:1520:27 [opt]
That huge number occurs in primality.gi, which causes the memory corruption, which is why people see a bug a little later in primality.gi, if they see a bug.
The question (which I don't yet know the answer to) is is this a problem with linking the wrong libgmp (we seem to be linking to GAP's internal libgmp in this case), or is our 'fakegmp' somehow messed up? It's hard to tell what's going on inside libgmp due to a lack of debug symbols.
— Reply to this email directly, view it on GitHubhttps://github.com/gap-system/gap/issues/5640#issuecomment-2124800711, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEIWGLPEJGUZDO4EOQBMQMLZDSMP3AVCNFSM6AAAAABDN6PGV2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRUHAYDANZRGE. You are receiving this because you were mentioned.Message ID: @.***>