compose icon indicating copy to clipboard operation
compose copied to clipboard

[BUG] watch crashes when deleting file

Open perosb opened this issue 1 year ago • 14 comments

Description

When deleting a file (unsure if the file existed or not but should have since it synced it) watch command crashes and cannot be restarted.

develop:
  watch:
    - action: sync
      path: ${LOCAL_DEPLOY_PATH}\platform
      target: c:/inetpub/wwwroot/
Syncing cm after changes were detected:
  - C:\t\docker\deploy\platform\Web.config.xdt
☺�container 061ae4ad9ec878e7a259e45aa1d7b4bd0dc56468b05497fa18cd71dd5f1c0cbe encountered an error during hcs::System::CreateProcess: failure in a Windows system call: The system cannot find the file specified. (0x2)

Then when trying to restart it is locked:

> docker compose watch --no-up
cannot take exclusive lock for project "kermit": process with PID 20836 is still running

Killing the 20836 process still errors out the same.

Steps To Reproduce

It seem to be reproducable

watching [C:\t\docker\deploy\platform]
Syncing cm after changes were detected:
  - C:\t\docker\deploy\platform\images.jpg
☻Rtar: Removing leading drive letter from member names
x inetpub/wwwroot/images.jpg☻☻
Syncing cm after changes were detected:
  - C:\t\docker\deploy\platform\images.jpg
☺�container 7b26f6516e4c0ed000d0d71b1f01250411af2ddab0b17cd3f7f3b391a4ee97a0 encountered an error during hcs::System::CreateProcess: failure in a Windows system call: The system cannot find the file specified. (0x2)

Compose Version

Docker Compose version v2.22.0

Docker Environment

Client:
 Version:    24.0.6
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.3
    Path:     C:\Users\Administrator\.docker\cli-plugins\docker-buildx.exe
  compose: Docker Compose (Docker Inc.)
    Version:  v2.22.0
    Path:     C:\ProgramData\Docker\cli-plugins\docker-compose.exe
  scout: Command line tool for Docker Scout (Docker Inc.)
    Version:  0.17.1
    Path:     C:\Users\Administrator\.docker\cli-plugins\docker-scout.exe

Server:
 Containers: 14
  Running: 0
  Paused: 0
  Stopped: 14
 Images: 861
 Server Version: 24.0.4
 Storage Driver: windowsfilter
  Windows:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics internal l2bridge l2tunnel nat null overlay private transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
 Swarm: inactive
 Default Isolation: process
 Kernel Version: 10.0 20348 (20348.1.amd64fre.fe_release.210507-1500)
 Operating System: Microsoft Windows Server Version 21H2 (OS Build 20348.1970)
 OSType: windows
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.86GiB
 Name: kermit-dev
 ID: YP3Q:GBSN:NFJ3:QQMM:DB3Z:CD3V:7RBI:V445:473L:3WU3:VOP3:5DVB
 Docker Root Dir: C:\ProgramData\docker
 Debug Mode: false
 Username: kermit
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

Anything else?

The file is not removed from container. No idea of how/where the lock is kept. A restart of docker-engine removed the lock.

perosb avatar Oct 05 '23 13:10 perosb

Same thing happens when copying many files into a synced folder like this:

    develop:
      watch:
        - action: sync
          path: ./web
          target: /inetpub/wwwroot

only way to recover from cannot take exclusive lock for project is to restart the host, not even restarting Docker Desktop or dockerd helps.

pbering avatar Oct 07 '23 11:10 pbering

@pbering and @perosb a possible workaround so you don't have to restart your host can be found in issue #11069:

https://github.com/docker/compose/issues/11069#issuecomment-1769694535

mrbiggred avatar Oct 23 '23 18:10 mrbiggred

lock is managed by https://github.com/moby/moby/blob/master/pkg/pidfile/pidfile.go#L29 According to "process with PID 20836 is still running" message, the compose process is still reported by system as "alive". If you can reproduce this issue, could you please inspect the referred process ?

ndeloof avatar Oct 30 '23 17:10 ndeloof

When the issue happens and I see that message, then there is no process with that PID.

pbering avatar Oct 30 '23 17:10 pbering

which OS are you running on ?

ndeloof avatar Oct 30 '23 18:10 ndeloof

which OS are you running on ?

Kernel Version: 10.0 20348 (20348.1.amd64fre.fe_release.210507-1500)
Operating System: Microsoft Windows Server Version 21H2 (OS Build 20348.1970)
OSType: windows
Architecture: x86_64

perosb avatar Oct 30 '23 18:10 perosb

Same on

OS Name:                   Microsoft Windows 10 Pro
OS Version:                10.0.19045 N/A Build 19045

AlexeyPlodenko avatar Dec 19 '23 07:12 AlexeyPlodenko

@pbering and @perosb a possible workaround so you don't have to restart your host can be found in issue #11069:

#11069 (comment)

Another workarround (Linux) is to stop containers before exiting watch. Ctrl+z to suspend watch docker-compose down to stop and remove containers fg to bring watch process into foreground Ctrl+c to exit watch

mac-hel avatar Dec 20 '23 17:12 mac-hel

According to "process with PID 20836 is still running" message, the compose process is still reported by system as "alive". If you can reproduce this issue, could you please inspect the referred process ?

I've been implementing small script that watches docker-compose.yaml and restarts compose watch on its changes (to handle actual state), and run into this issue.

The question is why it reports that process with PID XXXX is still running when no process with this pid is running? I could net get its logic. What is it checking for when start compose watch, does it check the process?

wclr avatar Jan 01 '24 11:01 wclr

@wclr process detection is implemented by https://github.com/moby/moby/blob/master/pkg/process/process_windows.go

ndeloof avatar Jan 01 '24 11:01 ndeloof

@wclr process detection is implemented by https://github.com/moby/moby/blob/master/pkg/process/process_windows.go

Well, here I believe it checks the process. But the fact is that when launching docker compose watch it reports that process with PID XXXX is still running while there is no XXXX in the tasklist (for example, in my case this PID was killed by the aforementioned script that spawned docker compose watch). Рeople above mention this too,, so you probably need to check the logic behind this report and check to ensure that it can not be the case.

wclr avatar Jan 02 '24 01:01 wclr

When the issue happens and I see that message, then there is no process with that PID.

Same. No such process at Windows host itself, neither at the container I watch (Ubuntu based app image). Seems to exist somewhere within the "docker engine space"

UPD. Managed to resolve this by updating to latest docker version. The installation itself said there's an assistance service process running and suggested to kill it. watch command works again after update, no host reboot needed.

At4m4n avatar Jan 10 '24 15:01 At4m4n

UPD. Managed to resolve this by updating to latest docker version. The installation itself said there's an assistance service process running and suggested to kill it. watch command works again after update, no host reboot needed.

It is not fixed, I used the latest version when run into this. And you don't need to reboot, you can just delete %LOCALAPPDATA%/docker-compose.[YOUR_COMPOSE_PROJECT_NAME].pid.

I eventually ended up writing my own custom watch script to fully replace docker compose watch functionality (for my case). It watches need file changes in the project and executes (inside the container) copying from the attached host volume to docker volumes to keep them in sync, this script as well initially runs rsync to make the initial sync with the host.

The problem mentioned in this issue is not the only one for current compose watch implementation, for example it also ignores /skips some file changes if they are made in a batch, so the custom solution can solve all this.

wclr avatar Jan 12 '24 12:01 wclr

I am deleting the image and the container and then run the docker-compose watch in one PowerShell script, to sync the files, before the containers are started:

Powershell.exe -noexit -command "cd ../..;  docker rm --force backoffice-php; docker rmi $(docker images --format '{{.Repository}}:{{.Tag}}'|findstr 'backoffice-php'); docker-compose watch"

AlexeyPlodenko avatar Jan 12 '24 13:01 AlexeyPlodenko

Can confirm that I'm seeing this issue on docker version 24.0.7, and a temporary work around for it is to delete the .pid file in: C:\Users<your-user-name>\AppData\Local\docker-compose<your-project-name>.pid

torrinworx avatar Jan 26 '24 23:01 torrinworx

Encountered this on Desktop 4.27.1 (136059), Engine: 25.0.2, Compose: v2.24.3-desktop.1

princemaple avatar Feb 07 '24 10:02 princemaple

Hey @perosb I'm not able to reproduce with the latest Docker Desktop release 4.27.2, can you give it a try? If you still have the issue, can you give us a minimal & complete reproduction case?

For all the other, if you don't have the same issue as the original one, please:

  • Check the latest version of Docker Desktop
  • If you can reproduce your issue, please open a new one with a full repo case

Thanks all 🙏

glours avatar Feb 09 '24 14:02 glours

Errors are still occurring (I've updated to the latest Docker version 4.27.2) :

$ docker compose watch cannot take exclusive lock for project "": process with PID 43396 is still running

My OS :

OS Name: Microsoft Windows 11 Famille OS Version: 10.0.22631 N/A build 22631

If necessary, I can open a repository.

Piglow19 avatar Feb 10 '24 13:02 Piglow19

for me manual delete file in AppData\Local\docker-compose\ *.pid solove problem but very annoying

docker version
Client:
 Cloud integration: v1.0.35+desktop.10
 Version:           25.0.3
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        4debf41
 Built:             Tue Feb  6 21:13:02 2024
 OS/Arch:           windows/amd64
 Context:           default

Server: Docker Desktop 4.27.2 (137060)
 Engine:
  Version:          25.0.3
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       f417435e5f6216828dec57958c490c4f8bae4f98
  Built:            Wed Feb  7 00:39:16 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

borgez avatar Feb 10 '24 13:02 borgez

Hey @perosb I'm not able to reproduce with the latest Docker Desktop release 4.27.2, can you give it a try? If you still have the issue, can you give us a minimal & complete reproduction case?

For all the other, if you don't have the same issue as the original one, please:

  • Check the latest version of Docker Desktop
  • If you can reproduce your issue, please open a new one with a full repo case

Thanks all 🙏

@glours I'm actually facing this issue with my repo here: https://github.com/torrinworx/Bitorch

To reproduce:

  1. $ docker compose -f .\dev.docker-compose.yml build --no-cache
  2. $docker compose -f .\dev.docker-compose.yml watch
  3. $ ctrl+c, and delete/remove running container/compose stacks
  4. $ docker compose -f .\dev.docker-compose.yml watch

Result:

PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch> docker compose -f .\dev.docker-compose.yml watch

cannot take exclusive lock for project "bitorch-development": process with PID 47156 is still running
PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch> tasklist /fi "PID eq 47156" 
INFO: No tasks are running which match the specified criteria.
PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch>

This is with docker desktop 4.27.2.

torrinworx avatar Feb 10 '24 13:02 torrinworx

Maybe this is related, but for me it looks like the %LOCALAPPDATA%\docker-compose\PROJECTNAME.pid-file is not getting removed properly. When exiting via CRTL + C, the exit code is 130 ($LastExitCode when using PowerShell), maybe thats the reason that watch-command is not working as intended?

I always need to delete that file, no other problems so far (but haven't played around with this new feature yet).

Running Docker Desktop 4.27.2 on Windows 10 Pro 22H2 using HyperV.

FibreFoX avatar Feb 13 '24 18:02 FibreFoX

This file is not expected to be removed after command completion, but when executed compose command check the registered pid is alive (see https://github.com/moby/moby/blob/master/pkg/process/process_windows.go).

ndeloof avatar Feb 13 '24 18:02 ndeloof

@torrinworx 👋 I used your repository but wasn't able to reproduce on my side. I don't know what happens to be honest, can you share me a recording so I'll be able to check if I'm not missing a step? Are you using WSL2 or HyperV as Docker Desktop Virtutal machine?

glours avatar Feb 15 '24 16:02 glours

@torrinworx 👋 I used your repository but wasn't able to reproduce on my side. I don't know what happens to be honest, can you share me a recording so I'll be able to check if I'm not missing a step? Are you using WSL2 or HyperV as Docker Desktop Virtutal machine?

@glours I'm using WSL2 on Windows 11 Home 22H2 22621.3155. Here is a video with this issue happening with the Bitorch repository I linked above:

https://www.youtube.com/watch?v=PzrfWC825Rc

torrinworx avatar Feb 15 '24 17:02 torrinworx

@torrinworx thank you very much! Can I ask you an another question, I want to check if you don't have a old version of Compose in your path that could take the priority in favor of the embedded version of Desktop Can you share the result of docker compose version please?

glours avatar Feb 15 '24 17:02 glours

np!

Huh yeah it looks like it's still taking the old desktop version: Docker Compose version v2.24.5-desktop.1

Even though my docker desktop client is saying v4.27.2

torrinworx avatar Feb 15 '24 17:02 torrinworx

Unfortunately no 😞 , Compose v2.24.5-desktop.1 is the version shipped in Docker Desktop 4.27.2

glours avatar Feb 15 '24 17:02 glours

@torrinworx can you try something else please, instead of doing docker compose watch directly can you try the following steps:

  • docker compose -f .\dev.docker-compose.yml up -d
  • docker compose -f .\dev.docker-compose.yml watch
  • Then do the CTRL+C
  • docker compose -f .\dev.docker-compose.yml watch again

And a 2nd test:

  • docker compose -f .\dev.docker-compose.yml up -d
  • docker compose -f .\dev.docker-compose.yml watch
  • Then do the CTRL+C
  • Don't remove the containers in Docker Desktop
  • docker compose -f .\dev.docker-compose.yml watch again

glours avatar Feb 15 '24 17:02 glours

Can you please check the status of the process listed in the lock file ?

Get-Process -Id 146328

ndeloof avatar Feb 15 '24 17:02 ndeloof

@torrinworx can you try something else please, instead of doing docker compose watch directly can you try the following steps:

  • docker compose -f .\dev.docker-compose.yml up -d
  • docker compose -f .\dev.docker-compose.yml watch
  • Then do the CTRL+C
  • docker compose -f .\dev.docker-compose.yml watch again

And a 2nd test:

  • docker compose -f .\dev.docker-compose.yml up -d
  • docker compose -f .\dev.docker-compose.yml watch
  • Then do the CTRL+C
  • Don't remove the containers in Docker Desktop
  • docker compose -f .\dev.docker-compose.yml watch again

So both tests result in the same error:

PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch> docker compose -f .\dev.docker-compose.yml watch
cannot take exclusive lock for project "bitorch-development": process with PID 165984 is still running
PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch>

However when I delete the .pid file from the directory they both work just fine.

@ndeloof After I CTRL+C the watch command and delete the containers this is the result:

C:\Users\torri>tasklist /FI "PID eq 162632"
INFO: No tasks are running which match the specified criteria.

C:\Users\torri>

It only shows No tasks are running... after you CTRL+C the watch command, when it's still running it will show this, even when the containers are deleted:

C:\Users\torri>tasklist /FI "PID eq 162632"

Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
docker-compose.exe          162632 Console                    1     47,308 K

C:\Users\torri>

torrinworx avatar Feb 15 '24 17:02 torrinworx