for-azure icon indicating copy to clipboard operation
for-azure copied to clipboard

Update doesn't work on edge channel

Open manixx opened this issue 8 years ago • 17 comments

I want to upgrade to Docker 17.05.0-ce but the upgrade.sh script fails.

Expected behavior

upgrade.sh https://download.docker.com/azure/edge/Docker.tmpl

Actual behavior

upgrade.sh https://download.docker.com/azure/edge/Docker.tmpl

executing upgrade on d12eb6d1e505
  File "/usr/bin/azupgrade.py", line 402
    subprocess.check_output(["docker", "node", "demote", node_id])
                                                                 ^
IndentationError: unindent does not match any outer indentation level

Information

Client: Version: 17.04.0-ce API version: 1.28 Go version: go1.7.5 Git commit: 4845c56 Built: Tue Apr 4 00:37:25 2017 OS/Arch: linux/amd64

Server: Version: 17.04.0-ce API version: 1.28 (minimum version 1.12) Go version: go1.7.5 Git commit: 4845c56 Built: Tue Apr 4 00:37:25 2017 OS/Arch: linux/amd64 Experimental: false

We have a swarm cluster with 3 masters and 2 workers. I called the script on one manager node (not the master).

manixx avatar May 15 '17 14:05 manixx

Not working at all. Where's the azupgrade.py in this project?

vovimayhem avatar May 21 '17 00:05 vovimayhem

Thanks for reporting the issue. We will publish a fix/workaround shortly. @vovimayhem the script is in the guide container running in the manager nodes and needs to be patched up.

ddebroy avatar May 30 '17 17:05 ddebroy

Are there any updates on this? :)

manixx avatar Jun 13 '17 09:06 manixx

Yes! @manixx You can now kick off an upgrade to docker to 17.05 (from 17.04) using the following command/container (rather than upgrade.sh earlier) docker run -v /var/run/docker.sock:/var/run/docker.sock -v /usr/bin/docker:/usr/bin/docker -ti docker4x/upgrade-azure:17.05.0-ce-azure2

ddebroy avatar Jun 13 '17 21:06 ddebroy

We will be updating the docs with the new mechanism once 17.06 is out.

ddebroy avatar Jun 13 '17 21:06 ddebroy

Sorry to reopen this, the update didn't worked on our cluster. :/ We have a 5 node cluster (3 masters, 2 worker).

The update procedure seemed fine (updated each node after another) but after the update, the first manager (the one where I started the update) didn't reconnect to the cluster by itself. The other two masters connected but the I was unable to ask the status (docker node ls, it caused an context deadline exceeded) and the Docker version was still 17.04 on all machines. Now i reimaged all machine to its inital state, started the the Swarm mode on the first master node manually and restarted all other machines.

manixx avatar Jun 19 '17 12:06 manixx

@manixx If you navigate to the Deployment under the resource group in the Azure portal, do you see any errors? From your description of symptoms it seems very likely the initial deployment update (before the node updates) could not be correctly applied.

Also, what was the environment you were upgrading from? Did you deploy your initial swarm using the link from docs.docker.com (i.e. not through cloud)

ddebroy avatar Jun 20 '17 01:06 ddebroy

In the Deployments-Tab there were no errors, but i found an "Write Deployments" error inside the Activity-Log of the manager scale set.

Deployment template validation failed: 'The template parameters 'registryLocation, registryName, adminUserEnabled, registrySku, storageAccountSku, storageAccountName, registryApiVersion' in the parameters file are not valid; they are not present in the original template and can therefore not be provided at deployment time. The only supported parameters for this template are 'adServicePrincipalAppID, adServicePrincipalAppSecret, enableSystemPrune, managerCount, managerVMSize, sshPublicKey, swarmName, workerCount, workerVMSize'. Please see https://aka.ms/arm-deploy/#parameter-file for usage details.'.

This is the only error in the log.

Initially I installed the Edge Channel from the official Docker page. If I can help you just let me know! :)

manixx avatar Jun 20 '17 10:06 manixx

Well that's a very interesting error. It seems to indicate that somehow you tried to deploy the default template associated with the independent Docker4Azure VMs rather than the Docker4Azure swarm template. How does the timestamp of the above correlate with when you tried the upgrade?

For the Manager VMSS resources, do you see anything in the Activity Log where the Operation Name is "Manual Upgrade"?

Also just to confirm, your Deployment status for the resource group in the Azure portal says "Succeeded" even after the failed upgrade attempt, correct?

ddebroy avatar Jun 20 '17 20:06 ddebroy

Initially (the first deployment of the resource group) we used this template to setup the cluster.

I do not see any Manual Upgrade in the Activity log sadly :/

Yes exactly. There is only the inital deployment there.

manixx avatar Jun 21 '17 14:06 manixx

If you do not see any ManualUpgrade events, I would guess the upgrade script issued the overall deployment upgrade API call but Azure simply returned success but failed a bit later - a condition I have noticed very occasionally. If you get a chance, can you retry the upgrade again and keep an eye on the initial logs from the upgrade as well as the Deployment tabs? You can get the update logs anytime by issuing docker logs editions_guide after running the upgrade container.

ddebroy avatar Jun 21 '17 18:06 ddebroy

I will re-run the update again in two weeks (around the 4th july) and I'll give a the exact logs of the container! :) I must be careful, because the last time i had to re-install all running containers. I'll give you feedback then!

manixx avatar Jun 22 '17 13:06 manixx

Thank you @manixx and appreciate your help and feedback with investigating the issue.

ddebroy avatar Jun 22 '17 15:06 ddebroy

I did find a bug where VM enumeration from the Azure side seems to have changed slightly leading to none of the upgrades actually taking place - the upgrade script/container would go through very quickly and exit. This has been fixed in docker4x/upgrade-azure:17.06.0-ce-aws1

ddebroy avatar Jun 29 '17 04:06 ddebroy

In my case I had additional Deployments attached to the resource group and it used the wrong one:

Unable to find image 'docker4x/upgrade-azure:17.06.0-ce-azure1' locally
17.06.0-ce-azure1: Pulling from docker4x/upgrade-azure
019300c8a437: Pull complete 
4d77251a915d: Pull complete 
99cc8ec5e0d8: Pull complete 
30591f8a8c96: Pull complete 
0238be837d6e: Pull complete 
aa3b9c543797: Pull complete 
5120ca55ece4: Pull complete 
22ca435c3db0: Pull complete 
78a6e8adcb99: Pull complete 
dad4f7185291: Pull complete 
Digest: sha256:c9dd5a6416388e1cdf840cf5af010523a76df4d22ececf0c328e6fc8ad3ca108
Status: Downloaded newer image for docker4x/upgrade-azure:17.06.0-ce-azure1
Copying upgrade script ...
Kicking off upgrade to https://download.docker.com/azure/stable/17.06.0/Docker.tmpl ...
INFO: Validate Template URL to upgrade to
INFO: Initiating upgrade. Create queue to prevent another simultaneous upgrade.
INFO: Updating Resource Group template. This will take several minutes. You can follow the status of the upgrade below or from the Azure console using the URL below:
INFO: https://portal.azure.com/#resource/subscriptions/77d0f8b4-ec1d-4515-8f90-ff225cd87243/resourceGroups/docker_swarm/overview
INFO: Updating Resource Group: docker_swarm
INFO: Inspecting deployment: Microsoft.VirtualNetworkGateway-20170630102333 at state Succeeded
INFO: Inspecting deployment: Microsoft.VirtualNetworkGateway-20170630092102 at state Succeeded
INFO: Inspecting deployment: Microsoft.Template at state Succeeded
INFO: Found deployment: Microsoft.VirtualNetworkGateway-20170630102333 deployed at 2017-06-30 08:56:16.330641+00:00

The upgrade went through, but I was on the same docker version as before.

I deleted those deployments and after that the upgrade worked as expected. So maybe a check should be added to get the deployment with the correct name/type or something.

tisoft avatar Jun 30 '17 10:06 tisoft

@tisoft thanks for reporting the multi deployment scenario. I will look at providing a way to the user to select the deployment they want upgraded.

ddebroy avatar Jul 07 '17 17:07 ddebroy

@ddebroy Sadly I wasn't able to re-run the update script again. We already updated to the Stable channel. Sorry.

manixx avatar Jul 18 '17 14:07 manixx