clipper
clipper copied to clipboard
[WIP] Support nomad
Nomad API
We can use the Nomad Api to place Docker containers on nodes. I chose to use a Python library to make things simpler https://github.com/jrxFive/python-nomad
How to address containers
To address those containers through ip & port, we have two options
- Either we use a load balancer, but Nomad environments are very flexible and there are many options on the market like Fabio, Nginx, HAProxy ...
- We use the DNS server to gather ip and ports of containers through SRV DNS requests. We have to do DNS caching though but it is quite standardized.
I chose to go with the option (2) because it is more standard across environments. The (1) can be supported but will require specific code.
Workflow when connecting
When the Clipper Admin connects, It will try to determine the ip and ports of each of the service in order to know if it needs to submit a new job, or use what already exists.
Example: For Redis, a redis.service.consul DNS request is sent, if it returns at least one ip:port, it is used, otherwise a Redis instance job is submitted. It will then keep sending SRV requests until the service is up, otherwise the process stops.
Selecting containers
Nomad does not have the notion of selectors. I propose to use conventions to solve this problem. Job are prefixed with clipper-{cluster-name}. This allows us to select them based on their name (when we want to stop a container for instance).
That is how it looks like in Consul UI
Managing the connection between Models and Query Frontend
This one is tricky. The problem if that both the ip and port of Query Frontend are going to change overtime. Meaning that we have to submit a new job every time.
The only way I could solve this was to use a load balancer (namely Fabio, one of the previously mentionned) and to do TCP forwarding. This leaves the responsability of Fabio to route to the correct ip and port. But this implementation is specific.
That means we are booting the Model containers with CLIPPER_IP='fabio.service.consul' and CLIPPER_PORT='7000'. This part needs to be improved though.
If you have any questions don't hesitate, I know this is quite a big description
Can one of the admins verify this patch?
@antoinesauray Thanks for making this PR. This will be a really helpful feature. Would you be able to
- Tag issues you created? (I remember you wrote about this in an issue page)
- Write a little bit more description in design / implementation?
- Write a way to QA this PR (because it is a big change, we want to QA on the top of tests)?
- Could you add some basic integration tests at least?
Thank you!
https://github.com/ucbrise/clipper/issues/749
Do you think it is possible to expose resource allocation & port number to containerManager? Seems like you hardcodes those values in
jobdescription
I was thinking about it, it can be done in the instanciation of NomadContainerManager
Need support here, i'm still stuck because of https://github.com/ucbrise/clipper/issues/751
@antoinesauray I don't have enough bandwidth to handle this recently. For the testing stuff, I will try to resolve by next Tuesday. Please leave one more message if I don't come back by next Tuesday.