vespa icon indicating copy to clipboard operation
vespa copied to clipboard

HTTP proxy to access nodes within the cluster

Open pfrybar opened this issue 1 year ago • 0 comments

Is your feature request related to a problem? Please describe.

When running a Vespa cluster on an internal network, it is difficult to access node specific HTTP APIs.

For example, consider the following configuration, where a Vespa cluster is run on an internal network with load balancers setup to access the config nodes (for deploying applications) and container nodes (for feeding and querying):

Screenshot 2023-12-14 at 10 52 22

To access an API on a specific node, like the Custom Component State API, one would need to expose the node/port for each content node outside of the internal network, or create a proxy of some sort. Note that a load balancer wouldn't work well here because the data on each content node needs to be explored independently.

Describe the solution you'd like

It would be useful to have an HTTP proxy running as part of the Vespa cluster to easily access the various APIs from a single point. For example, this could run on the config nodes and could use the host aliases to identify the hosts to proxy.

With this host in hosts.xml:

  <host name="vespa-content-0.vespa.cluster.local">
    <alias>content-0</alias>
  </host>

One could proxy a request through the config node to the custom component state API as follows:

curl https://vespa-config-lb:19071/proxy/v1/content-0:19107/state/v1/custom/component/
  • /proxy/v1 here is a new config API which does the proxying
  • content-0:19107/state/v1/custom/component/ is the <host>:<port>/<endpoint> where the request should be sent

This is just an example of how it could work.

Describe alternatives you've considered

The user could create a proxy manually or expose the ports outside of the internal network.

Additional context

This would be incredibly useful for Vispana which is a web UI to view the status of a Vespa cluster. With this proxy in place, it would be easy to add support for Vespa clusters deployed on an internal network. The problem now is that when discovering hosts on an internal network from the Vespa configuration, only the internal hostnames are returned. If Vispana is running outside of the network, these hostnames are useless and there is no easy way to query individual nodes. If the proxy is implemented, Vispana can use it to query e.g. the custom component state API and expose a simple exploratory view of the data.

pfrybar avatar Dec 14 '23 11:12 pfrybar