rust-extensions icon indicating copy to clipboard operation
rust-extensions copied to clipboard

Change fifo Io to PIPE in shim , just do like go shim. Resovled the raw fd case problem.

Open jokemanfire opened this issue 1 year ago • 4 comments

Related I have told this question to containerd . But looks like containerd will not change. So I will take a pr to change fifo to pipe. I have complete this code , after some ci test ,I will submit this pr.

jokemanfire avatar May 29 '24 07:05 jokemanfire

I found another two problem, when use fifo directly.

  1. ctr run -d busybox:latest test , it status will be stopping directly , but go shim will not.
  2. when containerd service is stop , all rshim io will broken, but not go shim.

This is a method to get this error. 1、Get a image Dockerfile like this:

FROM busybox:latest

COPY test.sh /

ENTRYPOINT ["sh","/test.sh"]

test.sh is blow this:

while true; do 
    sleep 3
    echo "hello"
    result=$?
    if [ $result -ne 0 ]; then
        date >> log.txt
        echo "echo faile . Result : $result" >> /log.txt
    fi
done

docker build get this image. use ctr import this image. 2、run a container then use rshim to run a container. 3、get this error stop containerd service . you can see the error message in this container. but go shim will not be influenced. So I think use a pipe in shim may be completely needed. This pr which I test can resolve this problem #278

friendly ping , @fuweid @mxpv @Burning1020 . Looking forward to your reply.

jokemanfire avatar Aug 12 '24 01:08 jokemanfire

tokio 1.40 pipe can resolve pipe problem perfect. friendly ping , @fuweid @mxpv @Burning1020

jokemanfire avatar Sep 29 '24 07:09 jokemanfire

Hi @jokemanfire , would you please file pull request to fix this? thanks

fuweid avatar Oct 11 '24 17:10 fuweid

@fuweid Please have a check #278

jokemanfire avatar Oct 12 '24 01:10 jokemanfire

tokio 1.40 pipe can resolve pipe problem perfect. friendly ping , @fuweid @mxpv @Burning1020

Hi @jokemanfire can you give more detail about why "tokio 1.40 pipe can resolve pipe problem perfect" ?

I have also encountered similar problem as you found: "when containerd service is stop , all rshim io will broken, but not go shim.", and I found another problem: the stdout stream of container process which comes from rust-shim is not flush at real time, flush one page in one time then delay a long time, not line-by-line, I don't know if this related to that use FIFO as process stdout directly

I am following up on this issue, please give some updates, Thanks !

zhaodiaoer avatar Nov 13 '24 03:11 zhaodiaoer

the stdout stream of container process which comes from rust-shim is not flush at real time, flush one page in one time then delay a long time, not line-by-line, I don't know if this related to that use FIFO as process stdout directly

This problem ,I didn't meet. Is there some method to get this problem? Use FIFO directly , will cause some problems , and the problem can learn from https://fuweid.com/post/2022-embedshim-kernel-is-my-sidecar/ . Thanks @fuweid . There 's some describe like " embedshim 同样也采用中转的方式来处理标准输入,但它直接将读写模式的有名管道交给了容器的标准输出,减少标准输出的拷贝。embedshim 插件属于 containerD 进程的一部分,一旦 containerD 重启,那么容器进程的 输入端 将收到 SIGPIPE 错误。对于这种情况,个人觉得是可以接受的。在交互模式下,用户会感知到容器引擎的停服。而线上环境的大部分场景都是采用 Headless 无交互模式,容器进程的输入端都是 /dev/null,而标准输出的状态由有名管道做持久化,不会因为 containerD 停服而出现 容器输出端 的 SIGPIPE 错误。 " I want to change FIFO to pipe, because some problems I think which is unacceptable in Rustshim. And change the 'pipe_os' to 'tokio_pipe', because the async trait which under high concurrency IO will cause the tokio_copy spwan will be residual.(I think it caused by the raw_fd, and there is a problem with implementing the Asynchronous trait) The Rustshim can't be delete successful.If there are some replication methods here, I would be happy to determine if the problem is caused by FIFO IO.

jokemanfire avatar Nov 13 '24 06:11 jokemanfire

the stdout stream of container process which comes from rust-shim is not flush at real time, flush one page in one time then delay a long time, not line-by-line, I don't know if this related to that use FIFO as process stdout directly

This problem ,I didn't meet. Is there some method to get this problem?

I didn't do any special thing before i encounter this problem, I have a program with high frequency log out, and when I follow logs via crictl logs -f xxx I got very long delay between intermittent output, after some investigating i found that log file produced from containerd-cri also intermittent, I guess some abnormal thing from new way of using FIFO or rust tokio runtime.

Simple diagram:

Go shim: |fifo reader| <-- fifo --> |io copier| <-- pipe --> |container process| Rust shim: |fifo reader| <-- fifo --> |container process|

The fifo and fifo reader are from containerd-cri and have no difference, i guess problem comes from second half

zhaodiaoer avatar Nov 13 '24 07:11 zhaodiaoer

the stdout stream of container process which comes from rust-shim is not flush at real time, flush one page in one time then delay a long time, not line-by-line, I don't know if this related to that use FIFO as process stdout directly

I think maybe I've found the cause. I'll try to file a PR about it later.

zhaodiaoer avatar Nov 13 '24 09:11 zhaodiaoer

Seeing level=error msg="copy io failed Input/output error (os error 5)" when running this, could this be related?

analytically avatar Nov 15 '24 21:11 analytically

copy io failed Input/output

If you patched #278 ? If yes, Could you provide a more detailed description or some logs . For checking if it is my patch's problem. Ps: binary io is not realize, nerdctl -t -d will fail.

jokemanfire avatar Nov 16 '24 03:11 jokemanfire

Not patched. I will patch and try again.

analytically avatar Nov 16 '24 07:11 analytically

Patched, same error, so not fixed with #278

analytically avatar Nov 16 '24 09:11 analytically

Patched, same error, so not fixed with #278

Could you support the debug log? It may caused by copy_console (tty) , there is no more information, so it cannot be determined.

jokemanfire avatar Nov 16 '24 14:11 jokemanfire

Image

This is what I could see already, any idea? I'll look at it more closely on Monday

analytically avatar Nov 16 '24 15:11 analytically

Image

This is what I could see already, any idea? I'll look at it more closely on Monday

I think in the spawn_copy while the read/write side closed suddenly, it may print this. You can check it , it should occur in tokio_copy.

jokemanfire avatar Nov 17 '24 01:11 jokemanfire

This issue is stale because it has been open 90 days with no activity. This issue will be closed in 7 days unless new comments are made or the stale label is removed.

github-actions[bot] avatar Feb 16 '25 00:02 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Feb 24 '25 00:02 github-actions[bot]