roc icon indicating copy to clipboard operation
roc copied to clipboard

`roc dev` with hello world snippet from tutorial hangs indefinitely

Open sullyj3 opened this issue 2 years ago • 18 comments

Using nix shell github:roc-lang/roc on x86-64 linux

Using this code snippet from the tutorial:

app "hello"
    packages { pf: "https://github.com/roc-lang/basic-cli/releases/download/0.5.0/Cufzl36_SnJ4QbOoEmiJ5dIpUxBvdB3NEySvuH82Wio.tar.br" }
    imports [pf.Stdout]
    provides [main] to pf

main =
    Stdout.line "I'm a Roc application!"

When I do roc dev, it gets stuck at

Downloading https://github.com/roc-lang/basic-cli/releases/download/0.5.0/Cufzl36_SnJ4QbOoEmiJ5dIpUxBvdB3NEySvuH82Wio.tar.br
    into /home/james/.cache/roc/packages

I don't think it's just a slow internet connection, since I can download that file from my browser fairly quickly. Let me know any information I should provide to help debug.

sullyj3 avatar Oct 05 '23 10:10 sullyj3

I'm also affected by this. I'm using the same hello world snippet. roc dev hangs and roc run segfaults.

~/roc 15:43:15
WSL $ ./roc_nightly-linux_x86_64-2023-10-04-b00f25b/roc version
roc nightly pre-release, built from commit b00f25b on Mi 04 Okt 2023 09:08:06 UTC

~/roc 15:44:32
WSL $ ./roc_nightly-linux_x86_64-2023-10-04-b00f25b/roc dev
^C

~/roc took 23s 15:45:02
WSL $ ./roc_nightly-linux_x86_64-2023-10-04-b00f25b/roc run
Segmentation fault

The snippet can be built successfully, but the binary segfaults. And this seems to be an issue in the loader, according to the GBD warning.

~/roc took 11s 15:50:15
WSL $ ./roc_nightly-linux_x86_64-2023-10-04-b00f25b/roc build
0 errors and 0 warnings found in 253 ms while successfully building:

    hello

~/roc 15:50:19
WSL $ gdb -q hello
Reading symbols from hello...
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /home/user/roc/hello.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) r
Starting program: /home/user/roc/hello
During startup program terminated with signal SIGSEGV, Segmentation fault.

LD_DEBUG doesn't give any hits unfortunately.

~/roc 15:57:10
WSL $ LD_DEBUG=all ./hello
Segmentation fault

Didn't debug any further.

IsaacDynamo avatar Oct 08 '23 14:10 IsaacDynamo

I cannot reproduce the hanging or the segfault on Ubuntu 22.04. Can you share the used OS and upload the hello executable here @IsaacDynamo?

Anton-4 avatar Oct 09 '23 11:10 Anton-4

Can you also share your exact OS @sullyj3?

Anton-4 avatar Oct 09 '23 11:10 Anton-4

WSL $ uname -a
Linux PC 4.19.128-microsoft-standard #1 SMP Tue Jun 23 12:58:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
WSL $ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.1 LTS
Release:        20.04
Codename:       focal

Retested in WSL Ubuntu 22.04, and that has similar behavior: dev hangs, run segfaults.

user@PC:~$ uname -a
Linux PC 4.19.128-microsoft-standard #1 SMP Tue Jun 23 12:58:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
user@PC:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:        22.04
Codename:       jammy

IsaacDynamo avatar Oct 09 '23 12:10 IsaacDynamo

Running on an other machine with a newer kernel does work. So it seem to be related to the kernel. 🤨

WSL $ uname -a
Linux LT 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
WSL $ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.1 LTS
Release:        22.04
Codename:       jammy

IsaacDynamo avatar Oct 09 '23 12:10 IsaacDynamo

Thanks, interesting... Can you upload the segfaulting executable here?

Anton-4 avatar Oct 09 '23 12:10 Anton-4

The binaries from 4.19+20.04 and 4.19+22.04 run fine when copied to 5.10+22.04. I would rather not upload them, because they will contain private data. But let me known if you want me to run some other tests.

IsaacDynamo avatar Oct 09 '23 12:10 IsaacDynamo

Ok, no problem, I'll try to set something up with on older kernel.

Anton-4 avatar Oct 09 '23 13:10 Anton-4

I was able to reproduce the hanging and the segfault with the 4.19 kernel on Ubuntu 20.04. I'm investigating now...

Anton-4 avatar Oct 10 '23 13:10 Anton-4

Using the legacy linker (./roc build --linker legacy main.roc) does work and based on the strange outputs I'm getting from ldd and strace I think the segfault is caused by a bug in the surgical linking process.

Anton-4 avatar Oct 10 '23 15:10 Anton-4

I've looked at the manual for execve and it should not even be capable of returning EEXIST here, so I assume this was a linux bug that has been fixed.

strace -f ./rocLovesZigSurgical 
execve("./rocLovesZigSurgical", ["./rocLovesZigSurgical"], 0x7ffd1e583478 /* 33 vars */) = -1 EEXIST (File exists)
+++ killed by SIGSEGV +++
Segmentation fault (core dumped)

I'll still check out the hanging issue.

Anton-4 avatar Oct 10 '23 17:10 Anton-4

Have you seen the dmesg message?

WSL $ ./hello
Segmentation fault
WSL $ dmesg | tail -1
[ 5788.699342] 20029 (hello): Uhuuh, elf segment at 00007fb104ebf000 requested but the memory is mapped already

With that error message I found some posts that might be related:

  • https://stackoverflow.com/questions/51656713/cannot-load-custom-elf-executable-in-gdb
  • https://github.com/torvalds/linux/commit/a4ff8e8620d3f4f50ac4b41e8067b7d395056843

IsaacDynamo avatar Oct 10 '23 21:10 IsaacDynamo

Oh good finds, I didn't think to check dmesg. I'll take a look

Anton-4 avatar Oct 11 '23 10:10 Anton-4

Can you also share your exact OS @sullyj3?

Sure:

⮞ uname -a
Linux dorian 6.5.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 23 Sep 2023 22:55:13 +0000 x86_64 GNU/Linux

⮞ lsb_release -a
LSB Version:	n/a
Distributor ID:	Arch
Description:	Arch Linux
Release:	rolling
Codename:	n/a

sullyj3 avatar Oct 11 '23 12:10 sullyj3

Hang on, nevermind, it seems to be working for me now with latest main. I probably should have recorded the exact commit I was having the problem on, but it would be the latest main as of when I submitted the issue. I won't close for now since others seem to be experiencing this also.

sullyj3 avatar Oct 11 '23 12:10 sullyj3

@bhansconnect do you know what's going on here? To summarize the issue;

  • (all?) roc executables (locally built) are segfaulting on operating systems with linux kernel 4.19
  • the same executable does not segfault when copied to newer linux kernels
  • linking with legacy linker "fixes" the issue
  • This dmesg message appears to be the core of the problem:
$ dmesg | tail -1
4159 (rocLovesZigSurg): Uhuuh, elf segment at 000055e4f24ee000 requested but the memory is mapped already

(similar issue on stackoverflow)

  • These issues raised by eu-elflint may be relevant:
$ eu-elflint examples/platform-switching/rocLovesZigSurgical 
GNU_RELRO segment not contained in a loaded segment
section [ 3] '.dynsym': symbol 3: st_value out of bounds
section [ 3] '.dynsym': symbol 200: st_value out of bounds
section [ 3] '.dynsym': symbol 307: st_value out of bounds
section [ 3] '.dynsym': symbol 468: st_value out of bounds
section [ 3] '.dynsym': symbol 644: st_value out of bounds
section [ 3] '.dynsym': symbol 773: st_value out of bounds
section [ 5] '.gnu.version_r': entry 2 has invalid offset to next entry
section [11] '.rodata': merge flag set but entry size is zero
section [17] '.tbss': thread-local data sections address not zero
section [34] '.symtab': _DYNAMIC symbol size 0 does not match dynamic segment size 432
section [34] '.symtab': symbol 2034: st_value out of bounds
loadable segment [12] is writable but contains no writable sections
section [ 4] '.gnu.version': symbol 1: invalid version index 2
section [ 4] '.gnu.version': symbol 4: invalid version index 2
section [ 4] '.gnu.version': symbol 5: invalid version index 3
section [ 4] '.gnu.version': symbol 6: invalid version index 2
section [ 4] '.gnu.version': symbol 7: invalid version index 2
section [ 4] '.gnu.version': symbol 8: invalid version index 2
section [ 4] '.gnu.version': symbol 9: invalid version index 2
section [ 4] '.gnu.version': symbol 10: invalid version index 2
section [ 4] '.gnu.version': symbol 11: invalid version index 2
section [ 4] '.gnu.version': symbol 12: invalid version index 4
section [ 4] '.gnu.version': symbol 13: invalid version index 5
section [ 4] '.gnu.version': symbol 14: invalid version index 2
section [ 4] '.gnu.version': symbol 16: invalid version index 7
section [ 4] '.gnu.version': symbol 17: invalid version index 7
section [ 4] '.gnu.version': symbol 18: invalid version index 2
section [ 4] '.gnu.version': symbol 23: invalid version index 2
section [ 4] '.gnu.version': symbol 24: invalid version index 2
section [ 4] '.gnu.version': symbol 26: invalid version index 2
section [ 4] '.gnu.version': symbol 27: invalid version index 2
section [ 4] '.gnu.version': symbol 28: invalid version index 2
section [ 4] '.gnu.version': symbol 29: invalid version index 3
section [ 4] '.gnu.version': symbol 30: invalid version index 2
section [ 4] '.gnu.version': symbol 32: invalid version index 2
symbol 3 referenced in old hash table in [ 7] '.hash' but not in new hash table in [ 6] '.gnu.hash'

$ eu-elflint examples/platform-switching/rocLovesZigLegacy 
section [18] '.tbss': thread-local data sections address not zero
section [33] '.symtab': _DYNAMIC symbol size 0 does not match dynamic segment size 544
section [33] '.symtab': _GLOBAL_OFFSET_TABLE_ symbol size 0 does not match .got.plt section size 264
section [33] '.symtab': symbol 2149: st_value out of bounds

rocLovesZigLegacyvsSurgical.tar.gz

Anton-4 avatar Oct 11 '23 18:10 Anton-4

I'm also seeing this issue under Ubuntu in WSL2 on Windows 10 Pro version 21H2.

roc dev hangs and doesn't do anything.

I follow setup instructions here and used the code from the tutorial.

Ubuntu details:

$ uname -a
Linux ASHDESKTOP 4.19.104-microsoft-standard #1 SMP Wed Feb 19 06:37:35 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.1 LTS
Release:        20.04
Codename:       focal

I'm running Roc version:

roc nightly pre-release, built from commit 49862da on Sa 16 Mär 2024 09:01:35 UTC

With basic-cli version:

https://github.com/roc-lang/basic-cli/releases/download/0.8.1/x8URkvfyi9I0QhmVG98roKBUs_AZRkLFwFJVJ3942YA.tar.br

ashleydavis avatar Mar 17 '24 21:03 ashleydavis

I expect updating the linux kernel could be an easy workaround.

Anton-4 avatar Mar 25 '24 10:03 Anton-4