bug:fix runc delete run before delete exec.fifo
if runc delete run before delete exec.fifo runc delete will container root dir "delete /runc/nmx003"
func handleFifoResult(result openResult) error {
if result.err != nil {
return result.err
}
f := result.file
defer f.Close()
if err := readFromExecFifo(f); err != nil {
return err
}
return os.Remove(f.Name())
}
os.remove will reurn a error ERRO[0030] remove /run/runc/nmx003/exec.fifo: no such file or directory
@ningmingxiao can you provide a reproducer?
I add some sleep at
func handleFifoResult(result openResult) error {
if result.err != nil {
return result.err
}
f := result.file
defer f.Close()
if err := readFromExecFifo(f); err != nil {
return err
}
time.Sleep(time.Second*30)
return os.Remove(f.Name())
}
runc create nmx001 runc start nmx001 when it is running at time.Sleep(time.Second*30) runc delete nmx001 it will happen. @kolyshkin It's not easy to reproduce unless a delay is added
I think RemoveAll is better
I think RemoveAll is better
In its current implementation, and if path points to a non-directory (or an empty directory) -- yes, it's essentially does the same as I suggested above, i.e.
err := os.Remove(path)
if err == nil || os.IsNotExist(err) {
return nil
}
but in a (theoretically possible) case when path is a non-empty directory, it will go ahead and remove it recursively, which is probably not what we really want here. I would prefer an error in such a case (even if it doesn't look like it's practically possible).
done, thanks @kolyshkin
ping @kolyshkin can this pr be merged ?
Pull Request Overview
This PR fixes a bug related to the improper deletion of exec.fifo, which causes issues when runc delete is executed before deleting exec.fifo.
- Updated error handling in handleFifoResult to ignore os.IsNotExist errors during file removal.
- Prevents an error from being returned when exec.fifo is already removed, ensuring smooth deletion of the container's root directory.
@Copilot Do you think this is really a bug.
If the container is deleted by other process, I think runc start / run should return the exact error and have a non-zero exit code, but not to ignore some errors. Or else the users may wrongly think that we start/run the container successfully, but the result is opposite.
Do you think this is really a bug. If the container is deleted by other process, I think
runc start / runshould return the exact error and have a non-zero exit code, but not to ignore some errors. Or else the users may wrongly think that we start/run the container successfully, but the result is opposite.
I think Copilot can't talk with us here.
@ningmingxiao @kolyshkin WDYT
@ningmingxiao can you explain the race in more detail? Is this code executed on a runc exec into a container, that while that is executing another process runs runc delete and in the runc exec we return an error to delete the fifo? Or what is it exactly?
we use docker 17.03,I find sometimes if runc start and then container exited shim will call runc delete before delete exec.fifo when run a short term process like ls or date
Ohh, makes sense. Thanks!
@lifubang friendly ping? I intend to merge if your concern has been solved