cloudflared
                                
                                
                                
                                    cloudflared copied to clipboard
                            
                            
                            
                        Updates on Windows with `cloudflared.exe update` can fail š
Describe the bug
On Windows cloudflared.exe, (sometimes) does not updates with command cloudflare.exe update.
To Reproduce Steps to reproduce the behavior:
- Download cloudflared.exe from github to C:\Cloudflared\bin\cloudflared.exe
 - Within a terminal cmd, run 
cd C:\Cloudflared\binand thencloudflared.exe update - Sometimes it works, sometimes it fails
 
When it fails, cloudfared.exe does not exists anymore, but there is 3 "new" files :
- cloudflared.exe.old
 - cloudflared.exe.new
 - cfd_update.bat
 
Expected behavior Cloudflared is updated and service is up
Environment and versions
- OS: Windows 2019 Standard edition (1809)
 - Architecture: Intel x86_64
 - Version: 2022.5.3 (built 2022-05-30-1513 UTC) from github
 
Logs and errors No errors, it says :
2022-06-16T13:48:52Z INF cloudflared has been updated version=2022.6.1
cloudflared has been updated to version 2022.6.1
And return code is 11
but cloudflare.exe does not exists anymore and service is down
This is in my backlog. Iām not able to give you an end date right now but Iām going to get to this.
CC: @abelinkinbio
This is dragging on. Can we please get it fixed?
This is occurring for me as well, I run the update command and it shuts down the service, downloads the new .exe file. Something seems to fail when trying to replace the old .exe though, and end up with two files -
- cloudflared.exe (old version, untouched)
 - cloudflared.exe.new (newly downloaded)
 
The service does not restart.
Could the issue be something with the old .exe being locked and unable to rename / remove? Maybe the service isn't shutdown in time for the rename / remove action?
I see the same issue, it looks like the Service doesn't fully shut down and ends up in a STOP_PENDING state then never replaces the new EXE file with the updated EXE.
This happens with the Tunnel is being actively or heavily used. In the examples I've seen it's usually due to streaming data over HTTPS via the tunnel. If you're doing standard GET/POST HTTP over the tunnel this doesn't occur.
@joshlemon that theory would make sense for my use-case. For a test, I manually shut down the service first, waited a short while, then ran the update command. It was able to download the new exe, replace the old one, and start the service. So, it does seem that the root cause is the script not waiting long enough for service to shutdown, before moving to next step and failing.
@sudarshan-reddy or @abelinkinbio is a fix still in the works for this?
Any follow up?
Same issue for me. Neither cloudflared update not running .exe from the documentation page seems to work, even with "Cloudflared agent" service stopped. Both ran as administrator. When I run .exe it just hangs with an empty console window.
cloudflared updatekept saying I had been updated to 2023.3.1, butcloudflared -vsaid I was running v2023.3.0c:\program files (x86)\cloudflaredcontained "cloudflared.exe" and "cloudflared.exe.new"
Based on the comments of @joshlemon and @some-guy-23 I realized running the update in an RDP session using the tunnel in question might be problematic.
I was able to complete the update using this command, with a brief interruption of my RDP session (although I notice that 'appwiz.cpl' still reports the old version number for cloudflared):
net stop cloudflared && cloudflared update && net start cloudflared
@MajorLettuce - [caveat: i have lots of windows experience and about a month of cloudflare experience. Apologies in advance if these thoughts are naive...]
- check the service config using 
sc qc cloudflaredfor anything odd - Compare the path of the service executable to the path where Windows is looking for 'cloudflared.exe' (when you run "cloudflared update"). Where is windows finding cloudflared (
where cloudflared)? What does your path look like (echo "%path:;="&echo "%")? - maybe you're running 32-bit cloudflared on a 64-bit system? Or running the 64-bit updater on a system currently running the 32-bit version?
 - maybe an antivirus or windows policy update has restricted write capabilities in 
C:\Program Files (x86)(or whatever folder cloudflared is running from)? - Uninstall and reinstall cloudflared (maybe there's a problem based on the version you're trying to update from)?
 - The cloudflare docs include instructions on creating multiple tunnel services on a single windows system
- You could run different versions of cloudflared from different paths on the same system, or run a tunnel from a non-standard folder that is not in your windows PATH
 - See if some other version of cloudflared is still running when your service is stopped ( 
tasklist /fi "IMAGENAME EQ cloudflared*") - Check the status of your cloudflare tunnels while you have the service stopped. Is the tunnel actually down?
 - Check (slowly) for any other versions of cloudflared.exe on your system (
c: && cd \ && dir /s/b cloudflared.exe) 
 
Update: https://github.com/cloudflare/cloudflared/commit/5dbf76a7aa58675a990194b86161c5ae15f84e24 has the fix if you are curious. This will be out in the next cloudflared release.
Note that you once you have 2023.4.1 and above, this should work.
The latest cloudflared should fix this problem. You'd obviously only be able test it in the release after once you have this. Let's keep this issue open to track that and report.
@sudarshan-reddy not sure if this was a fluke with my system, but after the update the service will:
- Shut down correctly
 - Properly download / replace / rename both the new and old executables
 - Not start-up the cloudflare service
 
So, progress, but didn't quite work for me.
@some-guy-23 - It's hard for us to know if this is related. Can provide event logs so we can understand why the service didn't restart?
@obezuk for the latest update (2023.4.2->2023.5.0), it actually seems back to the original issue. Where I end up with the service shutdown, and a .new file with the update, but the original exe not renamed / replaced. Basically like a timing issue where the original exe is still locked when it tries to replace?
Oddly, the Event Viewer logs make it appear like everything went fine. I see the following -
- cloudflared starting graceful shutdown
 - cloudflared terminated without error
 - Cloudflared service stopped
 - Cloudflared service starting
 - Cloudflared service arguments: [C:\Program Files (x86)\cloudflared.\cloudflared.exe tunnel run --token <token_value>]
 
If there are other logs I can provide, please let me know!
EDIT: I confused the running cloudflared service process with the cloudflared update process in a few locations. This has been fixed. Improved wording/formatting.
I'm currently experiencing this problem while upgrading from 2024.2.1 to 2024.3.0. After spending some time in procmon and reviewing the update logic, I think I know what's going on. Unfortunately, https://github.com/cloudflare/cloudflared/commit/5dbf76a7aa58675a990194b86161c5ae15f84e24 (version 2023.4.1) seems to have introduced a few more problems, and didn't address the original issue.
Issue 1: Not Waiting for Service Shutdown
As @joshlemon stated, this issue was originally observed on systems with heavy utilization. The logic responsible for service restart is located in cfd_update.bat. Here's the contents:
sc stop cloudflared >nul 2>&1
rename "C:\PROGRA~2\cloudflared\cloudflared.exe" cloudflared.exe.old
rename "C:\PROGRA~2\cloudflared\cloudflared.exe.new" cloudflared.exe
del "C:\PROGRA~2\cloudflared\cloudflared.exe.old"
sc start cloudflared >nul 2>&1
exit /b 0
The problem is that sc stop merely sends a stop control request to cloudflared. We need to wait for the service to stop, because cloudflared.exe is being used by the service. We can't rename cloudflared.exe until AFTER the service is stopped. So if cloudflared.exe takes more than a few milliseconds to stop, we're in trouble.
Solution: Replace sc stop with net stop which will wait for the service to stop, before resuming execution of the batch script. We can leave sc start alone, because nothing depends on service startup.
Issue 2: Waiting for cfd_update.bat
Prior to https://github.com/cloudflare/cloudflared/commit/5dbf76a7aa58675a990194b86161c5ae15f84e24, runWindowsBatch() used cmd.Start() to run cfd_update.bat. However, this was changed to cmd.Output() which collects stdout and waits for the batch file to finish running. This is a big problem. How can cfd_update.bat rename cloudfared.exe when that same process is waiting on cfd_update.bat to exit? This causes almost all the update logic in cfd_update.bat to fail. I've confirmed this behavior via procmon as well.
https://github.com/cloudflare/cloudflared/blob/bb29a0e19437c3baa6a6e64f44b5de769206ed18/cmd/cloudflared/updater/workers_update.go#L246
Solution: Revert back to using cmd.Start() which will not wait for cfd_update.bat to finish. After starting cfd_update.bat, the cloudflared update process needs to exit asap. To be extra cautious, have cfd_update.bat sleep for 1 second to give the cloudflared update process time to exit. This 1 second sleep is super overkill, especially with fixes for Issue 1 and Issue 3 in place, but could be an added level of safety.
net stop cloudflared >nul 2>&1
timeout 1
rename "C:\PROGRA~2\cloudflared\cloudflared.exe" cloudflared.exe.old
Issue 3: Service attempts to cleanup cfd_update.bat
Also introduced in https://github.com/cloudflare/cloudflared/commit/5dbf76a7aa58675a990194b86161c5ae15f84e24 is logic to remove cfd_update.bat after it finishes running. The line defer os.Remove(batchFile) causes the cloudflared.exe update process to remove cfd_update.exe when the function runWindowsBatch() exits. This is not something that should be done by a process mapped to a file (cloudflared.exe) that cfd_update.bat is trying to rename. As soon as cmd.Start() is hit, the cloudflared update process needs to exit asap.
https://github.com/cloudflare/cloudflared/blob/bb29a0e19437c3baa6a6e64f44b5de769206ed18/cmd/cloudflared/updater/workers_update.go#L244
I've further confirmed this is happening with procmon. I can see the cloudflared update pid spawning cfd_update.bat. Next, the batch file fails to rename/delete files. Depending on how fast the service was able to shutdown, I'll see a new service startup for the non-upgraded cloudflared.exe. And then out of nowhere, I see the old cloudflared update pid coming in to delete cfd_update.bat. Having all this extra logic after cmd.Start() is preventing cloudflared from exiting quickly, making it impossible to rename cloudflared.exe to cloudflared.exe.old.
Solution: Perform any upgrade cleanup logic during cloudflared startup and initialization. This needs to be handled by the newly upgraded cloudflared process coming online, not the cloudflared update process that's being shutdown.
Issue 4: Duplicate Rename & Cleanup Logic
cfd_update.bat is responsible for renaming cloudflared.exe to .old during the upgrade process. This is for good reason. A running process can't rename itself. Yet, we see in the Apply() function there is logic to rename current -> old, new -> current, and delete old. Having all this extra logic after calling runWindowsBatch() is preventing cloudflared from exiting quickly, making it impossible to rename cloudflared.exe to cloudflared.exe.old.
https://github.com/cloudflare/cloudflared/blob/bb29a0e19437c3baa6a6e64f44b5de769206ed18/cmd/cloudflared/updater/workers_update.go#L111
Solution: Let cfd_update.bat handle all the upgrade logic for Windows. If this code is for Linux/macOS, then it should be encapsulated in an else statement starting on Line 108, so it's not run during a Windows upgrade. https://github.com/cloudflare/cloudflared/blob/bb29a0e19437c3baa6a6e64f44b5de769206ed18/cmd/cloudflared/updater/workers_update.go#L108 The new service instance that's started at the end of cfd_update.bat can look for the existence of cfd_update.bat during service startup. If found, the new service instance can remove it. If you want to get fancy, you could drop a cleanup.json or .yaml containing items that the new service needs to cleanup. But watch out, this has security implications, since cloudflared runs as local system. If unprivileged processes can modify that .json file, cloudflared could be abused to delete protected files.
Issue 5: Console and Error Message After cfd_update.bat Starts
There are various places in update.go and workers_update.go where the update's progress is reported after cfd_update.bat is called. Normally I'm all about console logging and messages, but not after cfd_update.bat is started. Once that batch file is called, this cloudflared process needs to cease all update logic and exit. The update is now in the hands of cfd_update.bat. https://github.com/cloudflare/cloudflared/blob/bb29a0e19437c3baa6a6e64f44b5de769206ed18/cmd/cloudflared/updater/workers_update.go#L251 https://github.com/cloudflare/cloudflared/blob/bb29a0e19437c3baa6a6e64f44b5de769206ed18/cmd/cloudflared/updater/update.go#L125 https://github.com/cloudflare/cloudflared/blob/bb29a0e19437c3baa6a6e64f44b5de769206ed18/cmd/cloudflared/updater/update.go#L128 https://github.com/cloudflare/cloudflared/blob/bb29a0e19437c3baa6a6e64f44b5de769206ed18/cmd/cloudflared/updater/update.go#L188 https://github.com/cloudflare/cloudflared/blob/bb29a0e19437c3baa6a6e64f44b5de769206ed18/cmd/cloudflared/updater/update.go#L171
Solution: All logging done by the old cloudflared update process should be done BEFORE calling cfd_update.bat. Recommend writing a console notification like "Update successfully staged. Preparing for service restart to finalize update." just before starting cfd_update.bat. After that, it's all up to cfd_udpate.bat. If you want to capture logs of what happens in cfd_update.bat, a log file will need to be implemented, which cfd_update.bat can write to. For example:
@echo off
SET log="C:\Program Files (x86)\cloudflared\cfd_update.log"
echo %date% %time% - Update Log Started >> %log%
echo %date% %time% - Restarting cloudflared service >> %log%
net stop cloudflared >> %log% 2>&1
Issue 6: Failure to Delete cloudflared.exe.old
This issue is non-critical, and I'm only 75% sure what's causing it, so take this with a grain of salt. When manually running cfd_update.bat through it's paces, I once observed it fail to delete cloudflared.exe.old. It returned an Access Denied (see below). I tried to replicate this issue by restoring all the same files and re-running the same batch file, but no luck. Everything worked fine on subsequent runs. Only the first run failed.
I'm a security engineer by trade, and this reeks of anti-virus/anti-malware. AV loves to target executable files. Whenever an exe is modified, you'll routinely see AV scan it. The problem is that AV locks the file for changes during the scan, so you won't be able to modify it. On some systems, this scan can take a few seconds. After it's been scanned once, AV tends to cache the result, so subsequent scans aren't performed.
I wasn't able to prove it, but I'm 75% sure that line 2 of cfd_update.bat is triggering an AV scan, which prevents the file from being deleted on line 4.
Solution: Before deleting the file, check if it's locked for writing. This can be difficult to do in a batch script. Recommend move this logic to the new cloudflared service instance that's started at the end of cfd_update.bat. When that service is initialized, it can check if cloudflared.exe.old exists, and can implement more advanced logic to check if the file is locked, before attempting to delete it. Or it can just set a flag if the file exists, so the service tries to delete it every few seconds, until it finally succeeds and clears the flag.
I found a few more issues, the first of which could be a significant problem.
Issue 7: Use of 8.3 (Short) Filenames
https://github.com/cloudflare/cloudflared/commit/5dbf76a7aa58675a990194b86161c5ae15f84e24 introduced 8.3 filenames within cfd_update.bat for the rename and delete operations. The author indicates this was done because "The batch file doesn't play well with spaces." I suspect something else was going on (see Issue 8 below) that caused the unwanted behavior, because file path spaces are fine when they're surrounded by double quotes (which they are).
The big problem here is that 8.3 filenames are not universally supported on Windows. This legacy feature is commonly disabled on servers for both performance and security reasons.
Solution - Option 1: Change PROGRA~1 back to Program Files, and change PROGRA~2 back to Program Files (x86). As long as the filepath is surrounded by double quotes, this shouldn't be a problem.
Solution - Option 2: Use environmental variables instead. This also adds support for non-standard system drive letters. Since there are no spaces now, double quotes aren't technically needed, but they won't hurt either. For example:
rename "%ProgramFiles(x86)%\cloudflared\cloudflared.exe" cloudflared.exe.old
rename "%ProgramFiles(x86)%\cloudflared\cloudflared.exe.new" cloudflared.exe
del "%ProgramFiles(x86)%\cloudflared\cloudflared.exe.old"
Issue 8: Unix style line endings in a Batch script
When I first captured cfd_update.bat, I was a little surprised to find that it had unix-style line endings. Usually this isn't a big problem, but the batch interpreter does struggle with it sometimes. As explained here on serverfault, there are known issues with GOTOs/Labels and standalone colons when combined with unix line endings. Technically, none of these statements are currently in use right now. But I can't help but wonder if the weird behavior that prompted the use of 8.3 filenames was related to this.
Solution: To give cfd_update.bat the best chance of success, recommend writing it with Windows-style (cr,lf) line endings. https://github.com/cloudflare/cloudflared/blob/bb29a0e19437c3baa6a6e64f44b5de769206ed18/cmd/cloudflared/updater/workers_update.go#L215