Linux VM compiled from git commit 0d7eba4a or later fails on fetching updates from source.squeak.org
Last good commit: 1af9a9b3e (HEAD) CogVM source as per VMMaker.oscog-eem.3424 First bad commit: 0d7eba4af CogVM source as per VMMaker.oscog-eem.3444
Steps to reproduce:
- Compile Linux VM from 0d7eba4af or later
- Run Squeak trunk (updated to latest with Monticello-dtl.813), fetch updates from any repository
- Result: ConnectionClosed: Connection closed while waiting for data.
Note: There are no intervening commits between 1af9a9b3e and 0d7eba4af, so the issue is presumed related to VMMaker changes rather than platform code changes.
Hi Dave, I found and fixed a bad regression last week in eem.3471. What’s the version you’re using that fails? (output of squeak -version),,,^..^,,, (phone)On Nov 26, 2024, at 2:27 PM, David T Lewis @.***> wrote: Last good commit: 1af9a9b (HEAD) CogVM source as per VMMaker.oscog-eem.3424 First bad commit: 0d7eba4 CogVM source as per VMMaker.oscog-eem.3444 Steps to reproduce:
Compile Linux VM from 0d7eba4 or later Run Squeak trunk (updated to latest with Monticello-dtl.813), fetch updates from any repository Result: ConnectionClosed: Connection closed while waiting for data.
Note: There are no intervening commits between 1af9a9b and 0d7eba4, so the issue is presumed related to VMMaker changes rather than platform code changes.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Eliot, my locally compiled VM from latest git pull does still have the issue, version info is:
Virtual Machine
/usr/local/lib/squeak/5.0-202411252058-64bit/squeak Open Smalltalk Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3471] Unix built on Nov 27 2024 08:25:11 Compiler: 11.4.0 platform sources revision VM: 202411252058 lewis@pop-os:squeak/git/opensmalltalk-vm Date: Mon Nov 25 12:58:18 2024 CommitHash: 108c8d3a3 Plugins: 202411252058 lewis@pop-os:squeak/git/opensmalltalk-vm CoInterpreter VMMaker.oscog-eem.3471 uuid: c3abaac1-cda5-44ec-81c1-55154ab54aef Nov 27 2024 StackToRegisterMappingCogit VMMaker.oscog-eem.3470 uuid: 5eca4261-1c46-4eb4-bd5f-803847c2ab7f Nov 27 2024
Image
/home/lewis/squeak/Squeak6.0/squeak.13.image Squeak6.1alpha latest update: #23177 Current Change Set: trunk Image format 68533 (64 bit) Preferred bytecode set: SistaV1
Here is a summary of additional test results:
Symptoms: SocketStream test has many failures and timeouts. Socket tests has failures and also crashed the VM. Opening a repository on source.squeak.org fails. Updating Squeak from the update stream fails.
The issue is apparently related to both compiler and Slang code generation. With a compiler that exposes the problem, the issue appears first in commit 0d7eba4 "CogVM source as per VMMaker.oscog-eem.3444," and the symptoms do not appear to change in any later commits. The last good commit prior to that was 1af9a9b (HEAD) "CogVM source as per VMMaker.oscog-eem.3424", and the differences between these appear to be primarily related to Slang code generation.
I retested this on a much older Linux computer (thankfully rescued just in time from the recycle bin), and the issue does NOT appear there. I also have confirmation from Bruce O'Neel that he has been doing opensmalltalk-vm builds on Linux and has not seen any of the issues reported here.
Finally, I tried changing the gcc optimization level from -O2 to -O0, and this makes the problem go away.
The system I am using has an AMD processor and the following version information:
$ cat /proc/version Linux version 6.9.3-76060903-generic ([email protected]) (x86_64-linux-gnu-gcc-12 (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #202405300957~1732141768~22.04~f2697e1 SMP PREEMPT_DYNAMIC Wed N
$ gcc --version gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ spur64 -version 5.0-202412062008 Fri Dec 6 18:49:31 EST 2024 gcc 11 [Production Spur 64-bit x86_64 VM] CoInterpreter VMMaker.oscog-eem.3471 uuid: c3abaac1-cda5-44ec-81c1-55154ab54aef Dec 6 2024 StackToRegisterMappingCogit VMMaker.oscog-eem.3470 uuid: 5eca4261-1c46-4eb4-bd5f-803847c2ab7f Dec 6 2024 VM: 202412062008 lewis@pop-os:squeak/git/opensmalltalk-vm Date: Fri Dec 6 17:08:41 2024 CommitHash: 2fc2d0c8d Plugins: 202412062008 lewis@pop-os:squeak/git/opensmalltalk-vm Linux pop-os 6.9.3-76060903-generic #202405300957~1732141768~22.04~f2697e1 SMP PREEMPT_DYNAMIC Wed N x86_64 x86_64 x86_64 GNU/Linux plugin path: /usr/local/bin/../lib/squeak/5.0-202412062008-64bit [default: /usr/local/lib/squeak/5.0-202412062008-64bit/]
More to follow, with the above information I hope be able to track something down in gcc.
I compiled the Cog HEAD revision (squeak.cog.spur) on a legacy MacOS 12.7.6 and got the same behavior, impossible to connect thru SSL.
If compiler optimization level makes a difference, then it's most probably a sign that the generated code invoke UB.
Here is how to show the relevant diffs, which are all in the generated C code from VMMaker.oscog-eem.3424 through VMMaker.oscog-eem.3444. It's a lot of code to look at but maybe some more eyeballs will help.
$ git diff 1af9a9b3efdf18e353653a73ea0c411fc356a017 0d7eba4af323af35bd3921096636cea6116dc565
Recognizing that the issue is apparently related to C undefined behavior, and also associated with CCodeGenerator code generation changes, I used a VMMaker image to generate the code for VMMaker versions from VMMaker.oscog-eem.3424 through VMMaker.oscog-eem.3444.
I can confirm that the issue is introduced in VMMaker.oscog-eem.3444. Source generated from VMMaker.oscog-eem.3443 (into ./src/spur64.cog/ ) does not exhibit the issue, and code generated from VMMaker.oscog-eem.3444 exhibits the issue (in both cases on my system with -O2 compiler optimization).
So the issue is introduced in VMMaker.oscog-eem.3444, 23-Aug-2024 "Rewrite the Slang transpiler's parse tree and inliner".
This is a large VMMaker commit so we are still looking for a needle in a haystack, but I think the haystack may be a bit smaller now. I note that Eliot specifically asked for review and criticism in that commit, so please consider this as a much belated review :-)
@nicolas-cellier-aka-nice can you please say what compiler (and version level of compiler) you have on your legacy MacOS 12.7.6? I am not familiar with the Mac environment, but a compiler bug is not out of the question. I have gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 on my system, and the bug is present when compiling with -O1 or higher with generated VM sources from VMMaker.oscog-eem.3444 and above. But other compilers (including those used for our GitHub actions builds) do not show any problem at all.
Addition information, working with a (possibly bad?) gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 compiler:
The issue is introduced in VMMaker.oscog-eem.3444. Later fixes in VMM have no effect on the observed symptoms. The problem goes away with gcc optimization turned off ( -O0). The problem is present in both the stack VM and the Cog VM.
The issue is only in the main VM module lib/squeak/5.0-202408232148-64bit/squeak, as opposed to the plugins and VM modules. I confirmed this by compiling with almost all plugins external, and copying individual compiled files into the last known good build in lib/squeak/5.0-202407312233-64bit/. No other files (including SocketPlugin.so) cause a problem, only the main VM module is at issue.
The VMMaker.oscog-eem.3444 generated sources (in commit 0d7eba4af323af35bd3921096636cea6116dc565) produce over 90 additional compiler warnings in the ./vm build, mainly associated with function pointer assignments. After hand editing the generated source files to address the warnings, the problem still exists, so I see no evidence that these warnings are pointing to C undefined behavior issues.
I used native makefile for mac OS which I think rely on CC=clang as defined in ./building/macos64x64/common/Makefile.rules
% clang --version
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: x86_64-apple-darwin21.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
For me, the problem disappeared with commit cfd116184d9c4c84a51440419be959cf283e5e4d based on VMMaker.oscog-eem.3475.
Since potentially each and every operation on signed integer is subject to undefined behavior (or almost every), the C compiler won't warn you about it, but for the most suspicious cases.
A possibility is to instrument the generated code to detect UB at run time, at least with clang
https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html. The option -fsanitize=undefined seem to exist in gcc too if you wanna try... You should use it with no optimization to be sure (because optimization may remove some UB).
For me:
x86-64 - this compiler works: gcc version 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04) On ARM64, ie, raspberry PIs gcc version 12.2.0 (Debian 12.2.0-14)
Now that does not mean that there are no problems, just that those two compilers/OSes don't cause this to happen.
I did a quick check of the bugs fixed in gcc 11.5 and 12.1 and 12.2 and nothing jumps out as being the problem.
Hi, I noticed that you report that you use the following image:
Squeak6.1alpha latest update: #23177 Current Change Set: trunk Image format 68533 (64 bit) Preferred bytecode set: SistaV1
Is it possible at all please to test this issue on a Squeak 5.3 image in 68021 format please (64bit). Of course it should work with the Squeak 6.1alpha image, but I thought that maybe it could be useful to compare 5.3 and 6.1alpha for this issue.
Also are the failures when you run the SUnit tests (test tool) ?
There is a comment in the mvm scripts in the building directory that "some gcc versions create a broken VM using -O2" Also I have observed that in Squeak classic VM some SUnit tests on Squeak v3 images have some failed tests when compiling with gcc -O2 so this is not different from OpenSmalltalk. However this is not enough for a conclusion that -O2 is the reason for the problem.
@dstes yes, the issue appears with a Squeak 5.3 also (symptoms a bit different but fails due to the VM/compiler issue).
This issue is very likely a gcc compiler problem that is triggered by (but not caused by) changes in the generated slang code as of commit 0d7eba4. It does not affect VMs built on opensmalltalk-vm (which use a clang compiler rather than gcc).
The issue is reported for gcc-11 with optimization -O1 or -O2, and for gcc-12 with optimization -O2 ( -O1 does work) as tested on my Ubuntu-derived box with AMD processor, and it does not appear for gcc 13.3.0 on Intel (see report above).
In summary, this is probably a gcc compiler issue, and it does not appear to affect any VMs that are being built on the Squeak/opensmalltalk-vm build automation.
Closing the issue as a presumed gcc compiler problem. Workaround is to use reduced optimization setting (-01 or -O0) for the affected compilers.
I meant to close this issue back in January but must have botched the process, sorry.