mojo icon indicating copy to clipboard operation
mojo copied to clipboard

[BUG] `print(os.environ)` segfaults in Mojo code

Open strangemonad opened this issue 1 year ago • 1 comments

Bug description

SIGSEGV address 0x0 when trying to print the contents of `os.environ`

Steps to reproduce

run the following in a mojo cell


import os
print(os.environ)

note that the str-ified print(os.environ.__str__()) works fine



### System information

```shell
%%python
import subprocess
subprocess.Popen(["cat", "/proc/self/cgroup"])



11:hugetlb:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfa054673_2713_47b6_8007_15ab6c54e2f1.slice/cri-containerd-c5cc2ca6a9beff3089d578a7b9f79aeb77b51eb069c7a95b118e5a8c71887e3c.scope
10:memory:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfa054673_2713_47b6_8007_15ab6c54e2f1.slice/cri-containerd-c5cc2ca6a9beff3089d578a7b9f79aeb77b51eb069c7a95b118e5a8c71887e3c.scope
9:cpu,cpuacct:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfa054673_2713_47b6_8007_15ab6c54e2f1.slice/cri-containerd-c5cc2ca6a9beff3089d578a7b9f79aeb77b51eb069c7a95b118e5a8c71887e3c.scope
8:perf_event:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfa054673_2713_47b6_8007_15ab6c54e2f1.slice/cri-containerd-c5cc2ca6a9beff3089d578a7b9f79aeb77b51eb069c7a95b118e5a8c71887e3c.scope
7:devices:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfa054673_2713_47b6_8007_15ab6c54e2f1.slice/cri-containerd-c5cc2ca6a9beff3089d578a7b9f79aeb77b51eb069c7a95b118e5a8c71887e3c.scope
6:freezer:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfa054673_2713_47b6_8007_15ab6c54e2f1.slice/cri-containerd-c5cc2ca6a9beff3089d578a7b9f79aeb77b51eb069c7a95b118e5a8c71887e3c.scope
5:blkio:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfa054673_2713_47b6_8007_15ab6c54e2f1.slice/cri-containerd-c5cc2ca6a9beff3089d578a7b9f79aeb77b51eb069c7a95b118e5a8c71887e3c.scope
4:net_cls,net_prio:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfa054673_2713_47b6_8007_15ab6c54e2f1.slice/cri-containerd-c5cc2ca6a9beff3089d578a7b9f79aeb77b51eb069c7a95b118e5a8c71887e3c.scope
3:cpuset:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfa054673_2713_47b6_8007_15ab6c54e2f1.slice/cri-containerd-c5cc2ca6a9beff3089d578a7b9f79aeb77b51eb069c7a95b118e5a8c71887e3c.scope
2:pids:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfa054673_2713_47b6_8007_15ab6c54e2f1.slice/cri-containerd-c5cc2ca6a9beff3089d578a7b9f79aeb77b51eb069c7a95b118e5a8c71887e3c.scope
1:name=systemd:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podfa054673_2713_47b6_8007_15ab6c54e2f1.slice/cri-containerd-c5cc2ca6a9beff3089d578a7b9f79aeb77b51eb069c7a95b118e5a8c71887e3c.scope
0::/

strangemonad avatar May 30 '23 02:05 strangemonad

This is a notebook-specific problem. I can repro with two cells:

%%python
import os
print(os.environ)

The second cell is Mojo. FYI @walter-erquinigo @River707

Mogball avatar Jun 05 '23 21:06 Mogball

I couldn't repro the issue on top of tree. Let's see if this is still valid when staging is out

walter-erquinigo avatar Jun 07 '23 22:06 walter-erquinigo

@walter-erquinigo Awesome. Thanks for taking a look. If it can't be repro'd after today's release, please close this issue. @strangemonad feel free to reopen if you're still hitting it afterwards

Mogball avatar Jun 07 '23 23:06 Mogball

Just confirmed in the the release of the playground that this error doesn't happen anymore. I'll add an entry in the changelog for the next version

walter-erquinigo avatar Jun 08 '23 00:06 walter-erquinigo

@walter-erquinigo confirmed it works in this morning's release in both python and mojo cells (with the expected exception that in a mojo cell you get <module 'os' from '/opt/conda/lib/python3.8/os.py'> since there's no __str__ trait handling yet.

nit: I don't see it in the bug fixes in the release notes

strangemonad avatar Jun 08 '23 15:06 strangemonad

I'll include a fix bug entry in the next release :)

Could you send me a repro for the issue you see with <module 'os'...>?

walter-erquinigo avatar Jun 08 '23 16:06 walter-erquinigo

hmm I just hit the segfault again but it's intermittent if I restart the kernel sometimes I hit it and sometimes I don't

Forget what I said about printing <module ... I typo'ed print(os) before having my morning coffee.

Here's what I'm running, sometimes it's working and sometimes it's hitting the null pointer. I restart the kernel each time.

#%%
%%python
import os

#%%
print(os.environ)

This is the contents of os.environ

environ({'JUPYTERHUB_API_TOKEN': '62e3b01e57f649b698641688ea4aca53', 'PROXY_API_PORT': 'tcp://172.20.41.207:8001', 'PROXY_API_PORT_8001_TCP_PORT': '8001', 'JUPYTERHUB_API_URL': 'http://hub:8081/hub/api', 'HUB_SERVICE_HOST': '172.20.23.124', 'JUPYTERHUB_OAUTH_ACCESS_SCOPES': '["access:[email protected]/", "access:[email protected]"]', 'MAINTENANCE_PORT': 'tcp://172.20.207.204:80', 'MAINTENANCE_SERVICE_PORT': '80', 'JUPYTERHUB_SERVICE_PREFIX': '/user/[email protected]/', 'KUBERNETES_SERVICE_HOST': '172.20.0.1', 'HUB_PORT_8081_TCP': 'tcp://172.20.23.124:8081', 'KUBERNETES_SERVICE_PORT': '443', 'PROXY_API_PORT_8001_TCP_ADDR': '172.20.41.207', 'MAINTENANCE_PORT_80_TCP': 'tcp://172.20.207.204:80', 'DEBIAN_FRONTEND': 'noninteractive', 'CPU_GUARANTEE': '1.5', 'KUBERNETES_PORT': 'tcp://172.20.0.1:443', 'HUB_PORT': 'tcp://172.20.23.124:8081', 'LANGUAGE': 'en_US.UTF-8', 'JUPYTERHUB_OAUTH_SCOPES': '["access:[email protected]/", "access:[email protected]"]', 'XDG_CACHE_HOME': '/home/jovyan/.cache/', 'PROXY_PUBLIC_SERVICE_HOST': '172.20.78.241', 'HUB_SERVICE_PORT_HUB': '8081', 'KUBERNETES_PORT_443_TCP_ADDR': '172.20.0.1', 'JPY_SESSION_NAME': 'c82da021-b145-4d32-84dd-7ca1e258b0c8', 'JUPYTERHUB_SERVICE_URL': 'http://0.0.0.0:8888/user/[email protected]/', 'JUPYTERHUB_HOST': '', 'MAINTENANCE_PORT_80_TCP_PORT': '80', 'PATH': '/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/.modular/.derived/build/lib:/.modular/.derived/third-party/llvm-project/build/bin:/.modular/.derived/build/lib', 'PROXY_PUBLIC_PORT_8002_TCP': 'tcp://172.20.78.241:8002', 'PROXY_PUBLIC_PORT_80_TCP': 'tcp://172.20.78.241:80', 'PROXY_PUBLIC_PORT': 'tcp://172.20.78.241:80', 'PROXY_PUBLIC_SERVICE_PORT_HTTP': '80', 'AWS_REGION': 'us-east-1', 'HUB_SERVICE_PORT': '8081', 'AWS_STS_REGIONAL_ENDPOINTS': 'regional', 'JUPYTERHUB_ACTIVITY_URL': 'http://hub:8081/hub/api/users/[email protected]/activity', 'JPY_PARENT_PID': '7', 'MODULAR_PATH': '/.modular', 'CONDA_DIR': '/opt/conda', 'AWS_WEB_IDENTITY_TOKEN_FILE': '/var/run/secrets/eks.amazonaws.com/serviceaccount/token', 'HUB_PORT_8081_TCP_PROTO': 'tcp', 'MEM_LIMIT': '4294967296', 'HOSTNAME': 'jupyter-shawnmorel-40gmail-2ecom', 'AWS_ROLE_ARN': 'arn:aws:iam::466483404629:role/mojo_hub-prod-us-east-1-user-pod-role-role-d7b68cc', 'PROXY_API_SERVICE_PORT': '8001', 'JUPYTER_IMAGE_SPEC': '466483404629.dkr.ecr.us-east-1.amazonaws.com/mojo_hub-dev-us-east-1-notebook:0fa4611c2cb3083dcf9c722d3ef22328a8583e84', 'PROXY_PUBLIC_PORT_8002_TCP_ADDR': '172.20.78.241', 'MAINTENANCE_PORT_80_TCP_ADDR': '172.20.207.204', 'PROXY_API_PORT_8001_TCP': 'tcp://172.20.41.207:8001', 'KUBERNETES_SERVICE_PORT_HTTPS': '443', 'SHELL': '/bin/bash', 'LANG': 'en_US.UTF-8', 'AWS_DEFAULT_REGION': 'us-east-1', 'MOJO_JUPYTER_LOG_FILE': '/mojo_logs.log', 'JUPYTER_IMAGE': '466483404629.dkr.ecr.us-east-1.amazonaws.com/mojo_hub-dev-us-east-1-notebook:0fa4611c2cb3083dcf9c722d3ef22328a8583e84', 'JUPYTERHUB_SERVER_NAME': '', 'LLVM_SYMBOLIZER_PATH': '/.modular/.derived/third-party/llvm-project/build/bin/llvm-symbolizer', 'PROXY_PUBLIC_SERVICE_PORT': '80', 'PROXY_PUBLIC_SERVICE_PORT_METRICS': '8002', 'KUBERNETES_PORT_443_TCP_PORT': '443', 'LC_ALL': 'en_US.UTF-8', 'MAINTENANCE_SERVICE_HOST': '172.20.207.204', 'HUB_PORT_8081_TCP_ADDR': '172.20.23.124', 'KUBERNETES_PORT_443_TCP_PROTO': 'tcp', 'KUBERNETES_PORT_443_TCP': 'tcp://172.20.0.1:443', 'PROXY_PUBLIC_PORT_80_TCP_PORT': '80', 'JUPYTERHUB_BASE_URL': '/', 'NB_USER': 'jovyan', 'HOME': '/home/jovyan', 'PROXY_API_SERVICE_HOST': '172.20.41.207', 'JUPYTERHUB_OAUTH_CALLBACK_URL': '/user/[email protected]/oauth_callback', 'MOJO_REPL_VERBOSE_LOG': '1', 'MAINTENANCE_PORT_80_TCP_PROTO': 'tcp', 'JUPYTERHUB_DEBUG': '1', 'JUPYTERHUB_CLIENT_ID': 'jupyterhub-user-shawnmorel%40gmail.com', 'MOJO_JUPYTER_JSON_LOGS': '1', 'PROXY_PUBLIC_PORT_8002_TCP_PROTO': 'tcp', 'PROXY_PUBLIC_PORT_8002_TCP_PORT': '8002', 'CPU_LIMIT': '30.0', 'NB_UID': '1000', 'PROXY_API_PORT_8001_TCP_PROTO': 'tcp', 'JUPYTERHUB_ADMIN_ACCESS': '1', 'NB_GID': '100', 'JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES': '[]', 'JPY_API_TOKEN': '62e3b01e57f649b698641688ea4aca53', 'JUPYTERHUB_USER': '[email protected]', 'JUPYTERHUB_DEFAULT_URL': '/lab/tree/HelloMojo.ipynb', 'HUB_PORT_8081_TCP_PORT': '8081', 'JUPYTERHUB_SINGLEUSER_APP': 'jupyter_server.serverapp.ServerApp', 'MEM_GUARANTEE': '1073741824', 'PROXY_PUBLIC_PORT_80_TCP_ADDR': '172.20.78.241', 'LD_LIBRARY_PATH': ':/.modular/.derived/third-party/llvm-project/build/lib:/opt/conda/lib/python3.8/site-packages:/opt/conda/lib:/.modular/.derived/build/lib', 'PROXY_PUBLIC_PORT_80_TCP_PROTO': 'tcp', 'PYTHONPATH': '/opt/conda/lib/python3.8/site-packages'})

Note that I'm only restarting the kernel, not the playground instance so I don't think it's flapping between new and old release. Seems like a legit race condition somewhere.

This is the pod that I'm on

11:hugetlb:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb2df5ea3_cdf1_4ecb_ad7a_930b5f5d762c.slice/cri-containerd-ca0fd58a69baced8f15d7533ba0d12c9a4f73f0173302ff650d35a1cfcf892e2.scope
10:memory:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb2df5ea3_cdf1_4ecb_ad7a_930b5f5d762c.slice/cri-containerd-ca0fd58a69baced8f15d7533ba0d12c9a4f73f0173302ff650d35a1cfcf892e2.scope
9:cpu,cpuacct:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb2df5ea3_cdf1_4ecb_ad7a_930b5f5d762c.slice/cri-containerd-ca0fd58a69baced8f15d7533ba0d12c9a4f73f0173302ff650d35a1cfcf892e2.scope
8:perf_event:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb2df5ea3_cdf1_4ecb_ad7a_930b5f5d762c.slice/cri-containerd-ca0fd58a69baced8f15d7533ba0d12c9a4f73f0173302ff650d35a1cfcf892e2.scope
7:devices:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb2df5ea3_cdf1_4ecb_ad7a_930b5f5d762c.slice/cri-containerd-ca0fd58a69baced8f15d7533ba0d12c9a4f73f0173302ff650d35a1cfcf892e2.scope
6:freezer:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb2df5ea3_cdf1_4ecb_ad7a_930b5f5d762c.slice/cri-containerd-ca0fd58a69baced8f15d7533ba0d12c9a4f73f0173302ff650d35a1cfcf892e2.scope
5:blkio:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb2df5ea3_cdf1_4ecb_ad7a_930b5f5d762c.slice/cri-containerd-ca0fd58a69baced8f15d7533ba0d12c9a4f73f0173302ff650d35a1cfcf892e2.scope
4:net_cls,net_prio:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb2df5ea3_cdf1_4ecb_ad7a_930b5f5d762c.slice/cri-containerd-ca0fd58a69baced8f15d7533ba0d12c9a4f73f0173302ff650d35a1cfcf892e2.scope
3:cpuset:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb2df5ea3_cdf1_4ecb_ad7a_930b5f5d762c.slice/cri-containerd-ca0fd58a69baced8f15d7533ba0d12c9a4f73f0173302ff650d35a1cfcf892e2.scope
2:pids:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb2df5ea3_cdf1_4ecb_ad7a_930b5f5d762c.slice/cri-containerd-ca0fd58a69baced8f15d7533ba0d12c9a4f73f0173302ff650d35a1cfcf892e2.scope
1:name=systemd:/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb2df5ea3_cdf1_4ecb_ad7a_930b5f5d762c.slice/cri-containerd-ca0fd58a69baced8f15d7533ba0d12c9a4f73f0173302ff650d35a1cfcf892e2.scope
0::/

strangemonad avatar Jun 08 '23 16:06 strangemonad

and the plot thickens a little. I now just hit a segfault for 0x2020 (the others had been null pointers). This definitely seems to point to a memory lifetime issue with print.

strangemonad avatar Jun 08 '23 16:06 strangemonad

Thanks for the info. I'll take a look at it later

walter-erquinigo avatar Jun 08 '23 16:06 walter-erquinigo

I was able to isolate this to the mojo compiler itself, including the python interop, and excluding the REPL. This is a minimal repro main.mojo file that fails

from IO import print
from PythonInterface import Python

fn main() raises:
  let os = Python.import_module("os")
  print(os.environ)
  # works: print(os.environ.__str__())

I was able to repro this only on linux, not mac. It fails around 70% of the time.

@Mogball , could you please help retriaging this issue?

walter-erquinigo avatar Jun 19 '23 23:06 walter-erquinigo

@walter-erquinigo I might be off on this but I believe macos terminal stdout is unbuffered by default which ends up being handled by this pseudo-buffered-io code path which forces a synchronous flush while you're still in the print call https://github.com/apple-open-source/macos/blob/5f8bb48acd35f4c7dff5ec06e521d324bdd93713/Libc/stdio/FreeBSD/xprintf.c#L999

I obviously can't try running this myself on macos yet but I'd try

  • running the sample you gave and redirect stdout / stderr to a file
  • That might still trip through the line-buffered handling logic though (most stdout flush on newlines) running through unbuffered or one of the suggestions here might help https://superuser.com/questions/191080/how-to-log-stdout-of-a-background-process-w-o-buffering-on-mac

My hunch is that something like the following is happening (wild conjecture on my part since I don't have src access)

  • environ is converted to a string representation
  • some variant of libc printf is called passing the strings underlying buffer (or a llvm string ref?)
  • mojo does it's standard thing and puts a dealloc as soon as possible. In this case that would probably we right after the printf call.
  • In the async case you've now got a race between printf and dealloc.

strangemonad avatar Jun 20 '23 15:06 strangemonad

@stumpOS , you should check this out

walter-erquinigo avatar Jul 10 '23 18:07 walter-erquinigo

@stumpOS bump

stumpOS avatar Jul 17 '23 21:07 stumpOS