ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

"PalProcessExit: Returning exit code 1 "when run JupyterLab in trusted-bigdata contianer locally.

Open Jansper-x opened this issue 1 year ago • 7 comments

I try to run JupyterLab in trusted-bigdata contianer locally. But the jupyter service failed to start normally. I use the following commands.

export KEYS_PATH=/root/BigDL23/BigDL/ppml/keys/
export LOCAL_IP=*.*.*.*
export DOCKER_IMAGE=intelanalytics/bigdl-ppml-trusted-bigdata-gramine-reference-8g:2.4.0-SNAPSHOT

sudo docker run -itd \
    --net=host \
    --cpus=8 \
    --oom-kill-disable \
    --device=/dev/sgx/enclave \
    --device=/dev/sgx/provision \
    --name=jupyter \
    -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \
    -v $KEYS_PATH:/ppml/keys \
    -e RUNTIME_DRIVER_PORT=54321 \
    -e RUNTIME_DRIVER_MEMORY=16g \
    -e LOCAL_IP=$LOCAL_IP \
    $DOCKER_IMAGE bash
  echo "export JUPYTER_RUNTIME_DIR=$JUPYTER_RUNTIME_DIR && \
    export JUPYTER_DATA_DIR=$JUPYTER_DATA_DIR && \
    usr/local/bin/jupyter-lab notebook \
    --notebook-dir=/ppml/apps \
    --ip=0.0.0.0 \
    --port=8889 \
    --no-browser \
    --allow-root" >> temp_command_file
  export sgx_command="bash temp_command_file"
  gramine-sgx bash 2>&1 | tee /ppml/jupyter-notebook.log

I successfully start the gramine,but I got a more than 21000 lines log. And it contains the following information:

PermissionError: [Errno 13] Permission denied: '/root/.local'

But when I check the folder,the contianer don't have this folder.Even if I create a folder like this it doesn't work. And at the end of the log contains these messages:

[P1:libos] debug: IPC worker: received IPC message from 15: code=17 size=21 seq=2
[P1:libos] debug: clearing POSIX locks for pid 15
[P1:libos] debug: Sending ipc message to 15
[P15:libos] debug: IPC worker: received IPC message from 1: code=0 size=21 seq=2
[P15:libos] debug: Got an IPC response from 1, seq: 2
[P15:T15:python3] debug: Waiting finished: 0
[P15:T15:python3] debug: Sending ipc message to 1
[P15:T15:python3] debug: sync client shutdown: closing handles
[P15:T15:python3] debug: sync client shutdown: waiting for confirmation
[P15:T15:python3] debug: sync client shutdown: finished
[P15:T15:python3] debug: ipc_release_id_range: sending a request: [15..15]
[P15:T15:python3] debug: Sending ipc message to 1
[P15:T15:python3] debug: ipc_release_id_range: ipc_send_message: 0
[P1:libos] debug: IPC worker: received IPC message from 15: code=2 size=37 seq=0
[P15:libos] debug: IPC worker: exiting worker thread
[P1:libos] debug: IPC callback from 15: IPC_MSG_CHILDEXIT(1, 15, 1, 0)
[P1:libos] debug: Child process (pid: 15) died
[P15:T15:python3] debug: process 15 exited with status 1
debug: PalProcessExit: Returning exit code 1
[P1:T1:bash] trace: ---- return from wait4(...) = 0xf
[P1:T1:bash] trace: ---- rt_sigaction([SIGINT], 0x3c4d68ae0, 0x3c4d68b80, 0x8) = 0x0
[P1:libos] debug: IPC worker: received IPC message from 15: code=4 size=25 seq=0
[P1:libos] debug: ipc_release_id_range_callback: release_id_range(15..15)
[P1:T1:bash] trace: ---- ioctl(2, TIOCGWINSZ, 0x3c4d68d60) ...
[P1:T1:bash] trace: ---- return from ioctl(...) = -38
[P1:T1:bash] trace: ---- rt_sigprocmask(SETMASK, [], NULL, 0x8) = 0x0
[P1:T1:bash] debug: Created sigframe for sig: 17 at 0x3c4d68090 (handler: 0x3c50b3a70, restorer: 0x3c4e2eb40)
[P1:T1:bash] trace: ---- wait4(-1, 0x3c4d68020, WNOHANG, 0) ...
[P1:T1:bash] trace: ---- return from wait4(...) = -10
[P1:T1:bash] trace: ---- rt_sigreturn()
[P1:T1:bash] trace: ---- read(255, 0x3c51e2a40, 0xb97) ...
[P1:T1:bash] trace: ---- return from read(...) = 0x0
[P1:T1:bash] trace: ---- rt_sigprocmask(BLOCK, [SIGCHLD,], [], 0x8) = 0x0
[P1:T1:bash] trace: ---- rt_sigprocmask(SETMASK, [], NULL, 0x8) = 0x0
[P1:T1:bash] debug: ---- exit_group (returning 1)
[P1:T1:bash] debug: clearing POSIX locks for pid 1
[P1:T1:bash] debug: sync client shutdown: closing handles
[P1:T1:bash] debug: sync client shutdown: waiting for confirmation
[P1:T1:bash] debug: sync client shutdown: finished
[P1:libos] debug: IPC worker: exiting worker thread
[P1:T1:bash] debug: process 1 exited with status 1
debug: PalProcessExit: Returning exit code 1

Then the gramine closed and failed to run the jupyterlab. Why does this happen?

Jansper-x avatar Sep 15 '23 10:09 Jansper-x

Maybe you don't set JUPYTER_RUNTIME_DIR and JUPYTER_DATA_DIR, and the 'usr/local/' should be /usr/local. I try this command and successfully start jupyter service locally.

  cd /ppml
  export JUPYTER_RUNTIME_DIR=/ppml/jupyter/runtime
  export JUPYTER_DATA_DIR=/ppml/jupyter/data
  bash init.sh
  echo "export JUPYTER_RUNTIME_DIR=$JUPYTER_RUNTIME_DIR && \
    export JUPYTER_DATA_DIR=$JUPYTER_DATA_DIR && \
    /usr/local/bin/jupyter-lab notebook \
    --notebook-dir=/ppml/apps \
    --ip=0.0.0.0 \
    --port=8889 \
    --no-browser \
    --allow-root" >> temp_command_file
  #bash temp_command_file
  export sgx_command="bash temp_command_file"
  gramine-sgx bash 2>&1 | tee /ppml/jupyter-notebook.log

And you can try no-sgx first if encounter any questions.

hzjane avatar Sep 18 '23 02:09 hzjane

@hzjane hi, I have some questions about running Jupyter inside SGX, Q1: Did your solution patch Jupyter Lab's source code , or just directly run it in the TEE without any modifications? Q2: If there was no any patch to Jupyter Lab, does it mean that each time the code is submitted from the Jupyter Web UI, a clone subprocess is triggered to run this code inside SGX for security purposes? Q3: A possible approach could be running the Web UI outside SGX and only running the Jupyter Lab kernel inside SGX. Have you considered this approach? Thank you !

bronzeMe avatar Sep 18 '23 09:09 bronzeMe

@hzjane hi, I have some questions about running Jupyter inside SGX, Q1: Did your solution patch Jupyter Lab's source code , or just directly run it in the TEE without any modifications? Q2: If there was no any patch to Jupyter Lab, does it mean that each time the code is submitted from the Jupyter Web UI, a clone subprocess is triggered to run this code inside SGX for security purposes? Q3: A possible approach could be running the Web UI outside SGX and only running the Jupyter Lab kernel inside SGX. Have you considered this approach? Thank you !

hi. Q1: We didn't apply any patch to it. Q2: Maybe start a new kenel will call subprocess to run. Q3: No, We just tried this way that the webui and jupyter kernel both inside SGX.

hzjane avatar Sep 19 '23 02:09 hzjane

Maybe you don't set JUPYTER_RUNTIME_DIR and JUPYTER_DATA_DIR, and the 'usr/local/' should be /usr/local. I try this command and successfully start jupyter service locally.

  cd /ppml
  export JUPYTER_RUNTIME_DIR=/ppml/jupyter/runtime
  export JUPYTER_DATA_DIR=/ppml/jupyter/data
  bash init.sh
  echo "export JUPYTER_RUNTIME_DIR=$JUPYTER_RUNTIME_DIR && \
    export JUPYTER_DATA_DIR=$JUPYTER_DATA_DIR && \
    /usr/local/bin/jupyter-lab notebook \
    --notebook-dir=/ppml/apps \
    --ip=0.0.0.0 \
    --port=8889 \
    --no-browser \
    --allow-root" >> temp_command_file
  #bash temp_command_file
  export sgx_command="bash temp_command_file"
  gramine-sgx bash 2>&1 | tee /ppml/jupyter-notebook.log

And you can try no-sgx first if encounter any questions.

Thanks, I have successfully start jupyter service locally in the bigdl-ppml contianer. Now I try to run the jupyterlab in the official version of the Gramine 1.5 Docker image. Do I need to make additional modifications to gramine’s docker image?

Jansper-x avatar Sep 19 '23 10:09 Jansper-x

Maybe you don't set JUPYTER_RUNTIME_DIR and JUPYTER_DATA_DIR, and the 'usr/local/' should be /usr/local. I try this command and successfully start jupyter service locally.

  cd /ppml
  export JUPYTER_RUNTIME_DIR=/ppml/jupyter/runtime
  export JUPYTER_DATA_DIR=/ppml/jupyter/data
  bash init.sh
  echo "export JUPYTER_RUNTIME_DIR=$JUPYTER_RUNTIME_DIR && \
    export JUPYTER_DATA_DIR=$JUPYTER_DATA_DIR && \
    /usr/local/bin/jupyter-lab notebook \
    --notebook-dir=/ppml/apps \
    --ip=0.0.0.0 \
    --port=8889 \
    --no-browser \
    --allow-root" >> temp_command_file
  #bash temp_command_file
  export sgx_command="bash temp_command_file"
  gramine-sgx bash 2>&1 | tee /ppml/jupyter-notebook.log

And you can try no-sgx first if encounter any questions.

Thanks, I have successfully start jupyter service locally in the bigdl-ppml contianer. Now I try to run the jupyterlab in the official version of the Gramine 1.5 Docker image. Do I need to make additional modifications to gramine’s docker image?

PPML-image uses gramine-v1.3.1 as a base image, and i think there won't be many changes in the 1.5 version if EDMM is not enabled. Perhaps you should install jupyter and jupyterlab libraries , and just try it in the Gramine 1.5 Docker image.

hzjane avatar Sep 20 '23 02:09 hzjane

PPML-image uses gramine-v1.3.1 as a base image, and i think is won't so many changes in the 1.5 version. Perhaps you should install jupyter and jupyterlab libraries , and just try it in the Gramine 1.5 Docker image. @hzjane So, your solution did not apply any patch to gramine, right ? It seems that gramine did not support netlink natively, we used to think that your image had custom patches for gramine to support netlink .

bronzeMe avatar Sep 20 '23 02:09 bronzeMe

PPML-image uses gramine-v1.3.1 as a base image, and i think is won't so many changes in the 1.5 version. Perhaps you should install jupyter and jupyterlab libraries , and just try it in the Gramine 1.5 Docker image. @hzjane So, your solution did not apply any patch to gramine, right ? It seems that gramine did not support netlink natively, we used to think that your image had custom patches for gramine to support netlink .

We did patch Gramine to support netlink. https://github.com/gramineproject/gramine/compare/master...analytics-zoo:gramine:devel-v1.5.0-2023-07-19

hzjane avatar Sep 20 '23 02:09 hzjane