container-service-extension icon indicating copy to clipboard operation
container-service-extension copied to clipboard

Question - Running on Multiple vCD Cells

Open JonathanThorpe opened this issue 5 years ago • 14 comments

Hi,

We are in the process of deploying CSE - it seems lightweight enough that we have decided to run this on one of the vCD Cells.

My understanding is that the cse simply listens for messages on the AMQP - does this imply that we could therefore run multiple instances of the CSE (i.e. one on each cell)?

Thanks, Jonathan

JonathanThorpe avatar Oct 04 '18 23:10 JonathanThorpe

I can't answer your question around multiple instances (I'm curious about that too) but vCloud Director 9.5 just got released and has an appliance deployment option. Seeing VMware's track record on appliances one could guess that self installed vCD instances will be gone soon. My suggestion would be to start off on the right track with a standalone CSE instance? Totally up to you but I don't like double handling services like that.

TheNewStellW avatar Oct 05 '18 03:10 TheNewStellW

I am wondering what benefit are you expecting by running a separate CSE instance on each vCD cell?

vCD cells in a multi cell setup are all part of the same vCD cluster. They share the same database and has the same view of the cloud.

If you run multiple copies of CSE server, one of them (randomly) will pick up a message off AMQP queue, and work on it e.g. create a kubernetes cluster. So, in theory we can achieve some sort of performance boost (parallelism?) but at the same time we will be openning up ourselves for a plethora of race conditions e.g. updating a cluster that's already being updated by a different CSE server.

rocknes avatar Oct 08 '18 18:10 rocknes

We are interested in this feature as well, the performance isn't really relevant but high-availability is. We need to be able to assure that if one of the VM's with CSE running goes down we can continue to provide services until the issue is corrected.

mgaruccio avatar Oct 16 '18 15:10 mgaruccio

We have a similar problem here.

We thought, that we could run multiple vCloud Director Server with each of them running a CSE Server and a RabbitMQ Server but sharing the same vcloud Database:

what_we_thought

But since both CSE Server are using the same database and the database can only contain one RabbitMQ hostname for amqp exchange, we encountered strange behaviour.

what_happened

We weren't able to properly create new Kubernetes cluster with CSE and we also weren't able to properly Update the CSE VM Template for photon-v2 and ubuntu-16.04

After removing the vCloud Director VM 2 everything was working nice again.

So basically we'd like to know: How does a correct architecture looks like when we want to use multiple vCloud Director server and multiple CSE servers (for HA reasons).

Do we need a separate RabbitMQ server (or cluster, if HA is needed here) like in this image?: alternative_1

Or do we also need a separate CSE Server (or cluster) like in this image?: alternative_2

p-klassen avatar Mar 06 '19 10:03 p-klassen

The diagram with the rabbitMQ server in the middle of the vCD VMs is the correct architecture for a multi cell vCD deployment. (The CSE servers are misplaced though in the diagram).

I will update this thread with a diagram for a setup with multiple vCD, rabbitMQ and CSE tomorrow.

rocknes avatar Mar 08 '19 01:03 rocknes

As far vCD->CSE interaction is concerned, CSE is just a listener on the AMQP bus.

User triggers a CSE operation via vcd-cli, this operation is internally translated as a REST request to vCD. vCD receives the request, in turn from the request url recognizes that this a request meant for CSE and pushes a message on the AMQP bus. Now if we have multiple copies of CSE server running, only one of them will be able to consume the message and act on it (which is fine - because it serves our load balancing requirement).

However there is no guarantee that two successive command invocation from vcd-cli will go to the same CSE server. Also we should keep in mind that CSE architecture does not support multiple copies of CSE server officially, so expect system info command to report incorrect number of running tasks, it will report only the number of running task on one of the CSE servers.

Untitled Diagram

I have tested this configuration on my dev setup (by pure accident and hilarious outcome) . I left a CSE server running on my dev box at work. After work hours, I was testing some changes and started up a second CSE server on my personal laptop, both pointing to the same vCD installation (and AMQP server). All CSE command invocations were returning standard response from CSE server and not reflecting the changes I was testing. Took me some time to figure out that the CSE server at work was stealing all the messages off the AMQP bus. And as soon as I turned off the server at work, the second server started to process my requests normally. Hope this helps.

Regards Aritra Sen

rocknes avatar Mar 09 '19 01:03 rocknes

This indeed helps a lot, thank you very much for your help! I think It would help other people a lot if someone could put this into the CSE documentation in https://vmware.github.io/container-service-extension/

p-klassen avatar Mar 11 '19 08:03 p-klassen

Is there any plan to officially support HA CSE servers? We would really prefer to have an HA config but not if it means we will be unsupported in the event of issues with the software.

mgaruccio avatar Mar 11 '19 13:03 mgaruccio

Hi There

We haven't prioritized HA support for CSE Servers at the moment. It's not a tested and supported setup today. I don't have timelines on when we will get to it.

This is a place where we could take your contributions. Would you be willing to test this configuration in your setup?

Aashima

goelaashima avatar Mar 11 '19 22:03 goelaashima

VCD cells behind load balancer having public address entry of load balancer in VCD portal. In this scenario, if we integrate CSE with VCD load balancer then it shows error as do not get expected reply in case CSE directly integrated with VCD. Can someone suggest solution of this scenario of CSE integration with VCD cells behind NSX load balancer.

ghost avatar May 30 '19 04:05 ghost

Hi there -

Can you clarify what you mean by "VCD cells behind load balancer having public address entry of load balancer in VCD portal"? What steps did you take to "integrate CSE with VCD load balancer" and what did you get? Please clarify what was done along with outcomes of errors etc. so we can guide.

Aashima

goelaashima avatar Jun 04 '19 18:06 goelaashima

Do we have any progress in this subject ? We are looking for fully redundant CSE server deployment. Our VCD cells are behind loadbalancer.

Vijendrasi avatar Oct 04 '19 03:10 Vijendrasi

We have not tested HA of CSE Servers.

goelaashima avatar Oct 14 '19 20:10 goelaashima

Hello Ashima, CSE HA is very important from business continuity point of view. As a VCPP provide we are already using CSE in large scale in our platform. Will you please add this feature request in upcoming version ? We will provide you support in case any information or action required from our side.

Vijendrasi avatar Oct 15 '19 05:10 Vijendrasi