swarmkit
                                
                                 swarmkit copied to clipboard
                                
                                    swarmkit copied to clipboard
                            
                            
                            
                        Be mindful of image architectures when scheduling tasks
Based on conversation with @wsong: our real desire here is to run a service only on windows nodes with named pipe support rather than image architecture; what we really need here is scheduling based on named pipe support.
Thanks for the clarification @anshulpundir
what we really need here is scheduling based on named pipe support.
If we implement it this way, the solution becomes far less general.
Can we implement this based on platform support in the multi-arch image manifest? If we do that then this scheduling smarts will be generally useful for anyone using multi-arch, and not unique to just this named pipe usecase.
@dhiltgen @anshulpundir @wsong : do I understand correctly that the goal here is to ensure that we only schedules tasks on nodes with the right architecture & OS according to the image's manifest?
If so, it looks like this is already the case: when creating a service, by default (and except if the --no-resolve-image flag was set, or if platforms were explicitly set) the client already does retrieve the manifest from the registry and passes that to swarm (https://github.com/moby/moby/blob/12bba16306dd618883fe2e85ef1730efc572294f/client/service_create.go#L45 and https://github.com/moby/moby/blob/12bba16306dd618883fe2e85ef1730efc572294f/client/service_create.go#L66-L71).
(and we do the same on updates, https://github.com/moby/moby/blob/12bba16306dd618883fe2e85ef1730efc572294f/client/service_update.go#L44-L78)
This is then used by swarm when filtering which nodes are eligible to run the task (https://github.com/docker/swarmkit/blob/bc032e24784ea618044ee438fedec3458abb2ef9/manager/scheduler/filter.go#L253-L320).
So I'm not sure what the problem/goal here is. Could you please clarify?
Thanks!
@wk8 The problem today is if Swarm can't get the manifest from the registry (e.g. if it's a private image or if we don't have Internet connectivity). In that case, Swarmkit won't be able to tell what platforms the image supports, and therefore it won't be able to filter the nodes.
One way of solving this would be to add support for loading image manifests from an offline bundle (i.e. a .tar file); that isn't possible today. See https://github.com/docker/docker-core-backlog/issues/24
@wsong : thanks for the quick reply! :)
Just to make sure I understand correctly: currently, the client does query the registry for an image's manifest if there's a registry, but in the case of local multi-arch images loaded with docker load doesn't actually inspect those to extract the supported platforms (the same way docker run does to decide which bundled image to actually run); and that's what we'd want to change - correct?
My understanding is that docker save doesn't actually save the manifest data for an image at all; it just saves the image contents. Thus, you can use docker load to load the image data for your platform, but you won't know what other platforms the image is available for.
@wsong this can be stupid question but why you guys don't just focus to get docker manifest command marked as non-experimental so users can store manifest to local registries?
@olljanat That's basically what we're talking about here. What we need is a user-friendly way of shipping manifests to users without Internet connectivity.
@wsong : tried this morning, it seems that you can docker save/load a manifest, seems to work just fine. If that's indeed the case, and we add support within swarm to be able to use a local multi-arch image, then we're good, right?
(@wsong : the small caveat is that it seems that you do need to push the images in the manifest, and the manifest itself, to a registry before you can save it - once you've saved it you don't need access to the registry on whichever node you then load it on. Not sure why, but I see no reason why this couldn't be changed too)
Ugh please ignore my previous comment https://github.com/docker/swarmkit/issues/2770#issuecomment-436415198. Manifests are indeed not saved at all. So what we need is a way to save manifests, I guess, i.e. bundle a manifest and its images in a single archive. Would we want a docker manifest save command? Where load then only imports the images that can run on the host?
Actually, what we want is to just save the manifest (i.e. the metadata about which platforms a multi-arch image supports). For our purposes, we want to run a service on Windows nodes that are Windows Server 1709 or newer, but we want to create the service on a Linux node. That means that the Swarmkit manager (which will run on Linux) needs to have the manifest image metadata for the Windows images, but it doesn't need to load the actual image layers.
@wsong have you seen this one? https://medium.com/@mauridb/docker-multi-architecture-images-365a44c26be6
As far I understand this you should be able to store manifest to private registry using commands:
export DOCKER_CLI_EXPERIMENTAL=enabled
docker manifest push registry-host:5000/org/image:tag
That's fine, but we still need an offline solution for air-gapped clusters.
@wsong sorry but now I didn't understood. You need anyway some method to distribute images and manifest for swarm nodes so why you are not just running private registry inside of that air-gapped cluster?
Not all of our airgapped users have a private registry.
Not all of our airgapped users have a private registry.
On that case how correct nodes gets image? From some external process? Why they are not then checking during import process that only nodes with correct OS version gets that image?
For our purposes, we want to run a service on Windows nodes that are Windows Server 1709 or newer, but we want to create the service on a Linux node.
Btw. One option which we are using for similar use cases are add engine/node labels to nodes and use constraints based on them.
PS. Let me know if I'm bothering your working too much. This just sounds interesting use case so I'm trying to understand it.
We just use docker load to get images onto airgapped nodes. But, as discussed above, that only loads image data, not manifests.
(@wsong : the small caveat is that it seems that you do need to push the images in the manifest, and the manifest itself, to a registry before you can
saveit - once you've saved it you don't need access to the registry on whichever node you thenloadit on. Not sure why, but I see no reason why this couldn't be changed too)
It looks to be that docker manifest create command creates manifest locally and then docker manifest push is used to push it to Docker Hub/private registry.
So adding two new commands docker manifest save and docker manifest load would be probably best option.
@wsong would it work on your use case that you would use docker manifest load on swarm managers and docker load on worker nodes?
And just as reference. Current manifest commands are implemented on https://github.com/docker/cli/pull/138 and when all known issues on them have been fixed we can mark it non-experimental by merging https://github.com/docker/cli/pull/1355
@olljanat Yeah, that's what I was thinking; some sort of docker manifest save/load command.
Some more info which can be useful:
- When you create manifest with command docker manifest createit will be stored to folder .docker/manifests/under user's profile. 
- You can also download manifests without images from Docker Hub/private registry using this script https://github.com/TomasTomecek/download-manifest-from-dockerhub
But that is fully client side implementation so to be usable on here it would need to be modified as engine side solution.
Other thing which I noticed that swarmkit does not currently care about OS version even if that is specified on manifest, only CPU architecture and OS: https://github.com/docker/swarmkit/blob/master/api/types.proto#L82-L88 On process isolation mode Windows only support to run images which have same OS version than on node so if that information exists on manifest is should be used.
I also tested to save Windows image to tar and import back with docker load and looks that manifest with needed information is included to it:
"Architecture": "amd64",
"Os": "windows",
"OsVersion": "10.0.17134.345",
One options is also add support for node.platform.os_version constraint so user can use it together with node.platform.os to make sure that services are scheduled only valid hosts. Nice thing on that option is that it would work on airgapped environment without need to import images/manifests to swarm managers.
After more discussion, going to add a OS version constraint.
The manifest approach, while fitting the UCP use case, was also deemed clunky and half-baked as a generic feature, mainly because in the offline (ie, no registry) case it would have required all the nodes in the clusters to locally have the same version of all the manifests, as well as all the workers to locally have all the images relevant to their arch/OS listed in all the manifests, while providing no mechanism to actually handle this synchronisation nor the deployment of new manifest/image versions.
On the other hand, adding OS version constraints is easy for users to understand, and also perfectly addresses the UCP use case.
@wk8 Was this ever merged in Moby? it looks like this is a dependency for Kube on Windows.
Looks to be still pending https://github.com/moby/moby/pull/38349
Will do next week, thanks :)
@wk8 Should we close this issue since it looks like the work got merged?
@david-yu : the needed part in moby/moby got merged, but I still need to do the swarmkit partn sorry. Will do shortly.