vespa baseport for content cluster nodes doesn't work well for all node types

Describe the bug When we set baseport for content cluster nodes, it works for storagenode but doesn't work well for distributor/searchnode. distributor-base-port also doesn't work at all.

To Reproduce Steps to reproduce the behavior:

Deploy one content cluster with baseport

  <content id="test2" version="1.0">
    <redundancy>1</redundancy>
    <documents>
      <document type="test2" mode="index" />
    </documents>
    <nodes>
      <node hostalias="node1" distribution-key="0" baseport="20000"/>
    </nodes>
  </content>

Add another content cluster before the existing cluster

  <content id="test1" version="1.0">
    <redundancy>1</redundancy>
    <documents>
      <document type="test1" mode="index" />
    </documents>
    <nodes>
      <node hostalias="node1" distribution-key="0" baseport="10000"/>
    </nodes>
  </content>

  <content id="test2" version="1.0">
    <redundancy>1</redundancy>
    <documents>
      <document type="test2" mode="index" />
    </documents>
    <nodes>
      <node hostalias="node1" distribution-key="0" baseport="20000"/>
    </nodes>
  </content>

See warnings and restart messages

WARNING distributor cannot reserve port 10000 on vespa-container: Already reserved for storagenode. Using default port range from 19109
WARNING distributor cannot reserve port 10001 on vespa-container: Already reserved for storagenode. Using default port range from 19110
WARNING distributor cannot reserve port 10002 on vespa-container: Already reserved for storagenode. Using default port range from 19111
WARNING distributor2 cannot reserve port 20000 on vespa-container: Already reserved for storagenode2. Using default port range from 19120
WARNING distributor2 cannot reserve port 20001 on vespa-container: Already reserved for storagenode2. Using default port range from 19121
WARNING distributor2 cannot reserve port 20002 on vespa-container: Already reserved for storagenode2. Using default port range from 19122
WARNING Change(s) between active and new application that require restart:
In cluster 'test2' of type 'content':
    Restart services of type 'distributor' because:
        1) stor-communicationmanager.mbusport has changed from 19109 to 19120
stor-communicationmanager.rpcport has changed from 19110 to 19121
stor-status.httpport has changed from 19111 to 19122
In cluster 'test2' of type 'search':
    Restart services of type 'searchnode' because:
        1) # Port to use for the rpcserver.
proton.rpcport has changed from 19103 to 19114
# Port to use for the web server
proton.httpport has changed from 19107 to 19118
# Connect spec for transactionlog server.
# TODO Consider not using RPC at all
proton.tlsspec has changed from "tcp/vespa-container:19108" to "tcp/vespa-container:19119"
# Port number to use for listening.
translogserver.listenport has changed from 19108 to 19119

Expected behavior baseport setting should work properly for distributor/searchnode and we should be able to avoid restarting services.

Environment (please complete the following information):

OS: Ubuntu
Infrastructure: self-hosted (observed the same behavior on Kubernetes)
Versions 18.04

Vespa version 8.339.15

Additional context We want to avoid restarting unrelated distributor/storagenode/searchnode when adding/removing/updating content clusters because it causes outage for a while. baseport setting works for storagenode but doesn't work well for distributor (distributor-base-port also doesn't work) and doesn't affect searchnode.

May 08 '24 02:05 kawatea

After reproducing this locally I can confirm your observations that distributor-base-port simply doesn't work at all. I believe this particular property is a remnant from an older version of the content cluster model, which probably shouldn't have been documented at all. It belongs in a museum! 🦖

You are also correct that the searchnode service does not appear to get its ports assigned as expected relative to the specified base port. I'm not sure of the reason behind this, but a lot of the port assignment logic has been refactored over the years. Since we run our nodes in separate containers (without specifying base ports) instead of co-located, this means we probably haven't noticed any regressions popping up here.

I'd strongly suggest using containers to avoid this problem altogether. This is where our development focus is concentrated and deployment friction can be expected to be minimal. Unless there are technical reasons that preclude using containers for this use case?

Jun 20 '24 14:06 vekterli

Thank you for taking care of this. We basically followed the sample setting (e.g. https://github.com/vespa-engine/sample-apps/blob/master/examples/operations/multinode-HA/services.xml). What do you mean by containers and how can we set up?

Jun 21 '24 01:06 kawatea

What do you mean by containers and how can we set up?

In this context it would usually mean Docker/Podman containers or perhaps Kubernetes pods, depending on how you provision compute and storage resources.

You can observe that in the multinode HA example app distinct content nodes also have distinct node aliases (node8, node9 in this specific case) and are running in distinct Docker containers. The example app runs these on the same host—a production deployment would generally not do this (for availability reasons), but for testing that's perfectly fine.

For your use case that would mean that instead of having both test1 and test2 clusters use node1 you could instead have test1 use node1 and test2 use node2 (as an example), where these two would be running in separate Docker/Podman containers. If node1 and node2 are running on the same physical host, this would avoid port conflicts as well as help enforce privilege and resource usage separation between the two logical nodes.

Jun 21 '24 15:06 vekterli

I see, we are already using Kubernetes pods. We have many content clusters and each of them uses the grouped distribution with multiple nodes. If we assign different nodes for them, we need to manage hundreds of nodes and it's not realistic. So we want to use the same nodes in different content clusters.

Jun 24 '24 01:06 kawatea

we need to manage hundreds of nodes

Really, the right solution here is to have Vespa Cloud manage it, which it can do even though they are in your account. I'm forgetting why you said this wasn't an option for you, but if it might help speaking to the right people on your side we can try to do that for you. You can mail me, bratseth at vespa.ai.

Jun 24 '24 11:06 bratseth

Another member in our team is already talking about Vespa Cloud. Ultimately we want to use it but we cannot do it soon. It's great if there is a solution to handle this issue through configurations.

Jun 25 '24 01:06 kawatea

We see that solution as inferior to using multiple containers, since this provides uniform management of nodes and resource isolation between clusters, so we are unlikely to spend any time on this.

Jun 26 '24 08:06 bratseth

vespa vespa copied to clipboard

baseport for content cluster nodes doesn't work well for all node types

vespa
vespa copied to clipboard