yet-another-docker-plugin icon indicating copy to clipboard operation
yet-another-docker-plugin copied to clipboard

Support for swarm mode in 1.12?

Open padyx opened this issue 9 years ago • 38 comments

Do you think it is possible -and likely- to support swarm mode coming with Docker 1.12 in this plugin?

Checking the remote api 1.24 for services, I'd say that it would be entirely possible to create a service with a single task.

It would be great to offer possible features of swarm in the cloud / image configuration. For example:

  • Resource reservation, resource limit per service/image
  • Placement constraints per service/image, maybe even on a job basis (I have no clue if this is possible in jenkins )

padyx avatar Jul 21 '16 20:07 padyx

Yes, anything! But i stuck in docker-java upstream with integration tests :(

KostyaSha avatar Jul 21 '16 20:07 KostyaSha

My usual process is update docker-java with APIs, then define how it should work in plugin and implement.

Could you provide ideas how configuration should look/work in jenkins?

Having configuration on Job basis would be very useful, that should be very simple like in https://github.com/jenkinsci/docker-plugin/pull/383

KostyaSha avatar Jul 21 '16 21:07 KostyaSha

I skimmed through the documentation of the remote API. Most of the current configuration won't need to be changed. The changes that would likely be needed:

Cloud configuration:

  • Mode setting (swarm mode 1.12 or single host mode [or swarm standalone])
  • Master: Possibly: Specify more than one master for a single cloud? (Not sure if needed in the plugin itself)

Image configuration

  • Privileged Flag not supported yet, needs to be disabled (See also this comment )
  • Add fields for cpu, memory reservation
  • Add fields for cpu, memory limit
  • Add field for placement constraints

I could imagine the following override settings for the job-based configuration, but I'd think it would be good to think this over first:

  • Override cpu, memory reservation/limit
  • Override argument, and possibly command to execute

Also, from what I've read in the documentation, exposing ports might be more difficult than in the current version: From what I see, they need to be exposed explicitly (no nice "Publish All") flag. So this would require some random generator and possibly retry if it conflicts with an existing service.

As an inspiration, the Kubernetes Plugin looks very similar to the likely solution: image

padyx avatar Jul 25 '16 07:07 padyx

As an inspiration, the Kubernetes Plugin looks very similar to the likely solution:

That looks the same :/

Master: Possibly: Specify more than one master for a single cloud? (Not sure if needed in the plugin itself)

It may make sense when DockerClient will throw exception, but net-split issue will be under question.

Image configuration

Are they are the same as for standard docker? I can sync API to latest create/stop/remove features.

Override cpu, memory reservation/limit Override argument, and possibly command to execute

It could be possible extend JobProperty #72 in future and require contstraints.

As an inspiration, the Kubernetes Plugin looks very similar to the likely solution:

They has only reservation/etc limits, and it will be solved with syncing create command to latest features as soon as https://github.com/docker-java/docker-java/pull/673 will be added in docker-java.

KostyaSha avatar Aug 21 '16 22:08 KostyaSha

@padyx can standard docker client work with docker engine that in swarm mode?

KostyaSha avatar Aug 21 '16 22:08 KostyaSha

can standard docker client work with docker engine that in swarm mode?

Yes, but any containers started via the regular container API (/containers/create) will be created on that specific host - and not in the swarm. So we do need to call a different API if swarm mode is selected.

Image configuration

Not all of the features that we currently can configure for regular containers are possible for swarm mode. Priviledged and SHM-Size are two of the features that don't work with Docker 1.12 in swarm mode. I haven't made a full comparison yet.

padyx avatar Aug 22 '16 06:08 padyx

I've seen a few related and promising looking pull requests over at docker-java (docker-java/docker-java#686, docker-java/docker-java#678, docker-java/docker-java#673). How long - if at all - do you think it will take to take advantage of those in this plugin? Is it on anyone's priority list?

avandorp avatar Sep 01 '16 15:09 avandorp

@avandorp unfortunately i do both projects in my free time, in docker-java they stuck because of integration tests. In docker-plugin it bit unclear how better design classes (i can add additional checkbox in Cloud and Template or subclass classes according to architecture design).

KostyaSha avatar Sep 01 '16 15:09 KostyaSha

@KostyaSha Can we assist you in some way to get this moving? Help fix integration tests in docker-java, help with architecture sketches here, or something else?

padyx avatar Sep 02 '16 06:09 padyx

Yes, sure. But swarm mode is not needed for this plugin as jenkins should have exact mapping. I think swarm cli is the only useful thing for orchestration.

KostyaSha avatar Sep 02 '16 06:09 KostyaSha

But i may mistake... open for discussion.

KostyaSha avatar Sep 02 '16 09:09 KostyaSha

But swarm mode is not needed for this plugin as jenkins should have exact mapping. I think swarm cli is the only useful thing for orchestration.

Could you elaborate on what you mean with this? I don't quite follow.

padyx avatar Sep 02 '16 10:09 padyx

One of the swarm-mode features is to have scaling, but with jenkins you can't do it without pre-creating Cloud objects on jenkins side. So you it would like create a lot of single services for every job that looks weird.

KostyaSha avatar Sep 02 '16 10:09 KostyaSha

I see - my assumption for a possible solution was to use the Docker Remote API for services and to adapt the plugin to:

  • For each starting job (cloud node provision): Spawn a swarm service with 1 task (the jenkins slave task)
  • For each terminating job (cloud node unprovision): Stop and remove the swarm service

This would lead to creating and destroying services without taking advantage of scaling.

Is this what you thought, or do you see another option to support running jenkins jobs on Docker Swarms (with Swarm mode)? Or would you have increased the scaling of the Swarm service and connected to the "free" task created by the scaling?

padyx avatar Sep 02 '16 10:09 padyx

So you will have a lot of similar services?

Or would you have increased the scaling of the Swarm service and connected to the "free" task created by the scaling?

It may be possible if docker will listen events, but it would be too difficult i think. In any case we can create experimental provisionings and try!

KostyaSha avatar Sep 02 '16 10:09 KostyaSha

So you will have a lot of similar services?

Yes, we'd have a lot of similar services if implemented that way, because we'd not care about scaling.

We checked the documentation and experimented with the remote API and our conclusions are:

  • Using a single service per Docker image would not work:
    • The swarm does not return the id of the created task from the /services/.../update endpoint, leaving us with no option to identify which task was just created. Unless only a single jenkins were to control the swarm, then theoretically we could compare the task list before/after the operation to identify the new task. But that would be a very unstable implementation
  • Using a single service per job run seems to work:
    • Use a GET request to /services to list all services, and identify already reserved ports
    • Generate random ports in the ephemeral range for all ports that need bindings
    • Use POST request to /services/create to start a single service (replicas=1)
    • (If necessary and the port got taken in the meantime, repeat the steps above)
    • Use a GET request to /services/<serviceid>/ to identify the created task and connect to it
    • After job completes: Use a DELETE request to /services/<serviceid> to remove the service

We'd suggest the "single service per job" implementation. What is your opinion?

padyx avatar Sep 07 '16 08:09 padyx

We'd suggest the "single service per job" implementation. What is your opinion?

Looks similar to existing logic. Now the question will be how code could be refactored... and how generic swarm could fit...

KostyaSha avatar Sep 07 '16 21:09 KostyaSha

Without knowing at all how the plugin is structured today - that sounds like a strategy pattern. There'd be one strategy for normal use and one strategy for the swarm, depending which configuration was chosen.

padyx avatar Sep 09 '16 07:09 padyx

+1 supporting swarm mode would be great!

skahlhoefer avatar Sep 26 '16 14:09 skahlhoefer

Small note, near this topic. Thinking how better implement 2 level provisioning in jenkins.

KostyaSha avatar Jan 08 '17 22:01 KostyaSha

Swarm mode itself is not suitable for jenkins. Classical swarm is the best choice. It will expose api that could be used for balanced slave containers runs and building images. Swarm mode is mostly for app runs: run X containers, restart them. That's all isn't possible for jenkins builds.

KostyaSha avatar Feb 20 '17 22:02 KostyaSha

@KostyaSha From jenkins build perspective we can spin new container for build with replica = "1" always. Then internally swarm mode will load balance and spin container in some host. For people using swarm mode already, they have to do another classical swarm setup just for jenkins builds. This would be a overhead of maintaining two clusters. Supporting swarm-mode would be very helpful.

adityacs avatar Mar 01 '17 11:03 adityacs

+1

dsahithi9 avatar Jul 24 '17 16:07 dsahithi9

I also would like to see swarm mode support. I have raised a separate issue talking about how connecting the Cloud URL to a load balancer fails catastrophically apparently because the launched container cannot be located on subsequent calls after create (because the load balancer redirects the request to different nodes). My next thought was, maybe I could connect the Cloud URL to a swarm master since it's internal service discovery knows where all the containers that relate to a service exist (in this case there would only ever be one). But of course YADP needs to support the API calls to create swarm services rather than simple docker containers I suspect.

In an enterprise setting not being able to scale to use multiple hosts associated to a single YADP Cloud is a significant problem. Sure we could have multiple Clouds but that doesn't really equal scalability and you are still left with a single point of failure of your singleton host.

goffinf avatar Aug 16 '17 20:08 goffinf

@goffinf In the meantime you can switch to docker swarm, that keeps the docker API (non-service based) while keeping a clusterized docker installation. There wouldnt be any need for load balancers, as docker swarm already does it.

witokondoria avatar Aug 17 '17 08:08 witokondoria

@witokondoria, thx for your comment. I might try that, although I am somewhat reluctant to use what is essentially a deprecated product.

I would probably keep the ELB since it allows the use of a CName (R53 recordset alias) and would abstract the physical IP of the swarm master.

goffinf avatar Aug 19 '17 20:08 goffinf

@padyx @KostyaSha @adityacs What do you think the propspects are for supporting swarm mode in YADP (in the constrained way outlined in this issue - single job per service) in the near term ?

Certainly in the corporate space, everyone I come across is using a scheduler of one type or another and therefore needs to leverage the service abstraction (nothing says a service can't be a single container stack). So whilst scalability (and resilience) won't necessarily be achieved by starting multiple containers, being able to schedule individual Jenkins slave containers across a cluster of managed nodes still represents a significant improvement from the single point of failure that is the current situation.

This isn't a criticism of the work to-date which I'm sure we all appreciate very much, but I am certainly having a tough time persuading architects and solution designers where I work of the elegance of ephemeral slaves when they discover this limitation.

As @padyx I am more than happy to contribute in any way I can, maintaining multiple projects with lots of people asking for change and trying to separate the high priorities from the nice to haves can be a lonely place :-)

Kind Regards

Fraser.

goffinf avatar Aug 24 '17 08:08 goffinf

YADP uses https://github.com/docker-java/docker-java client for all docker operations. From the changelog(https://github.com/docker-java/docker-java/blob/master/CHANGELOG.md) I see that swarm-mode is yet not officially supported in docker-java client.

adityacs avatar Aug 24 '17 08:08 adityacs

@goffinf The comment of adityacs is correct, that first there would need to be an implementation of Swarm APIs in docker-java. Another java api would be the https://github.com/spotify/docker-client which also offers a Java API and already supports Swarm APIs.

The changes themselves are likely not that big - refactoring the plugin to use different strategies would be required though. I have a very rough proof-of-concept Jenkins plugin using the Spotify APIs that successfully launches a service, executes a job and kills the service. (Currently not open sourced)

The major question for me is for @KostyaSha : Since this is your repository (and plugin), would you consider such a Swarm mode at all? If not, we'd probably have to create another plugin.

padyx avatar Aug 24 '17 09:08 padyx

@padyx I am also in a situation that requires ephemeral build slaves launched across a Docker Swarm Cluster via short lived services. Ultimately, if I cannot find a solution to allow this functionality then I was going to roll my own plugin.

To restate the sentiment that has been expressed here multiple times, I absolutely appreciate what @KostyaSha has done with this plugin. I also understand the time involved to maintain and add features can be quite challenging, especially with multiple endeavors such as career and family taking their toll. So by all means I am not complaining in the slightest and absolutely understand the situation.

Rather, I'd like to figure out a plan like everyone else so that the future state allows for Jenkins to use modern Docker Swarm for Ephemeral Slaves. I know many of us would be more than eager to contribute directly to this project to allow for this capability.

I think it's important to get a definitive answer for when or if this plugin will ultimately support what we are after. If it's not in the cards or may be much longer down the road than we desire, then we either get approval to contribute to this plugin or perhaps band together to create a fork or a new project.

Since this feature is so important to many I could see it adding so much value that it'd be very popular. We all win as a community if we can band together and make this happen. Setting a plan in motion is the next step and I'd be happy to get involved.

danieleagle avatar Aug 24 '17 16:08 danieleagle