cass-operator [Discuss] Simplify CassDC services

What is missing?

The current services in the cassDC are slightly confusing to users at times. We have CLUSTERNAME-DCNAME-all-pods-service as well as a CLUSTERNAME-DCNAME-dc1-service which differ only in that all-pods-service has additional ports and also exposes pods which are not yet passing readiness probes.

We also have the additional-seeds-service which is empty for all single DC deployments (most deployments).

Why do we need it?

Services are usually subject to monitoring and having excess services clutters the operator experience. For developers, it is currently confusing which service should be used to talk to the Cassandra API on 9042. While experienced users will be able to infer this from which ports are open, it makes things less intuitive.

Proposed solution**

This is just a suggestion, there may be additional factors to consider given commentary in k8ssandra-operator issue #67.

I suggest we rename:

CLUSTERNAME-DCNAME-service -> CLUSTERNAME-DCNAME-client-apis {{CLUSTERNAME-DCNAME-all-pods-service }} -> CLUSTERNAME-DCNAME-internal-monitoring

(No need for the type of the resource at the end of the name as this is obvious when inspecting it.)

I also suggest we find a way to avoid creating the additional-seeds-service unless additional seeds are actually required. Testing will be needed to ensure we continue to avoid an additional STS rolling restart - which is not desirable.

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: CASS-55

Oct 07 '21 05:10 Miles-Garnsey

This would break ALL existing cluster installations of cass-operator. Not cass-operator installation itself, but all those applications that connect to the Cassandra. The service names can be mentioned in the documentation on how to connect to Cassandra inside Kubernetes, but renaming them makes very little sense. I don't actually remember tickets about anyone running into issues due to service naming.

Should such rename occur, it should happen in a version where we have to break all applications in any case. I don't foresee such version at this point in time however.

Oct 07 '21 06:10 burmanm

This would break ALL existing cluster installations of cass-operator.

So firstly, let's separate the topic of renaming the services from the topic of not instantiating the additional-seeds service, the latter doesn't break anything.

For the renaming, we just need to think through the migration pathway. We can do some combination of having both sets of services in play for a couple of versions, coupled with deprecation warnings. We can also consider having a setting in the CassDC for those users who really like the old services and want to continue using them.

It isn't impossible to do, we just need to do it in several steps.

Oct 07 '21 06:10 Miles-Garnsey

I really fail to see the value. Your proposal would add even more services in the meanwhile, and what would that acceptable path time be? 2-3 years? We would still need to explain users in the documentation which services to connect to.

Current additional-seeds-service allows modifying external seeds without causing STS restart, and even without modifying the CassandraDatacenter object at all. The path to multicluster-services allows to modify the service to THE external path to seeds, without cass-operator needing to do anything. That seems to be the way in multicluster-services Kubernetes sig also, keeping single-dc targets detached from the actual replication. cass-operator can read new IPs from that service without itself modifying it.

Maybe one could disable / enable that feature by modifying "additionalSeeds" to *[]string or something and check if it's empty, create the service and if it wasn't set, don't create the service. But this is quite minor benefit still.

Oct 07 '21 07:10 burmanm

Your proposal would add even more services in the meanwhile, and what would that acceptable path time be? 2-3 years? We would still need to explain users in the documentation which services to connect to.

More documentation is always a good idea!

If there is a concern about having more services while we transition then we can put an additional field in the cassDC to turn off the old services. useNewServiceNames: true would work. Once we have finished the deprecation process we can remove the field. I'd propose that we align with the rest of the k8s ecosystem with a 1 year deprecation window.

We need to get comfortable with deprecations and these kinds of processes so that we can remain adaptable to the k8s ecosystem, which moves very fast.

cass-operator can read new IPs from that service without itself modifying it.

I hear you on not wanting cass-operator to modify the service, but cass-operator is already creating the service right? If we moved the creation into k8ssandra-operator then I guess that might be the happy compromise? Would that be feasible?

I think asking "what's the benefit" is always a worthwhile question. There is user feedback to suggest that the current arrangements are confusing. Not all of that feedback ends up in our GH tickets, but it is coming in through other channels which is why I've raised one myself.

Customers coming in through DataStax support and services often don't expect to have to come to GH to create their own tickets as well.

Oct 07 '21 08:10 Miles-Garnsey

➤ Erick Ramirez commented:

+1 to this idea. I realise that initially it won’t be easy to implement but we should aim to achieve this level of intuitiveness for our users in the medium to long term. 🍻

Oct 08 '21 11:10 sync-by-unito[bot]

Can we agree to turn this ticket into a documentation one for the time being? Giving better explanations on how/why the various services should be used would be a great first step, with no breaking changes. I've created a follow up ticket to deal with creating the additionalSeed service only if it's necessary: https://github.com/k8ssandra/cass-operator/issues/347

Jun 13 '22 13:06 adejanovski