ksql icon indicating copy to clipboard operation
ksql copied to clipboard

ksqldb headless in-depth behavior documentation

Open ccuz opened this issue 3 years ago • 0 comments

Our use-case of ksqldb: We are using ksqldb headless-mode to achieve rolling upgrade of our pipeline. We have many ksqldb server app (i.e. different service-id per processor/service) in our kafka/k8s pipeline of processors. Such a processor can be either a python-kafka client/producers, a kafka-streams app or a ksqldb-headless app. Each app is deployed as K8s service/container. Each app implement part of the overall data-analytics pipeline, all reading/writting from/to kafka topics (including merges and slpit-streams).

input-topic-1 --> processor/app-a --> topic-2 --> processor/app-b --> sink-topic-3

With ksqldb headless-mode, because no execution plan is written into an internal topic (i.e. command topic), we can just deploy a new version of a ksqldb-server-app on K8s while the old one is still running using rolling-deployment (i.e. no need for ksql-migration and global state management). As the new deployment shares the same 'service-id', they belong to the same kafka group-clientid. Thus some message takes the legacy ksql-paths while some will already take the new ones. When the old-deployment is stopped, all message take the new paths. We also don't have duplicate messages, as the same client-group-id is used, old and new version commit to the same kafka topic read-index. Rollback also works, but may leave around some new created topics with message laying inside that would need manual intervention.

Problem/Risk: The proposal [of rodesai](https://github.com/confluentinc/ksql/issues/6282): 'Instead of rebuilding queries from scratch, we should save the execution plan to the internal topic and rebuild from the saved plan' would make the headless mode useless as one would have to manage ksqldb queries upgrades globally using ksql-migration from outside the K8s cluster and first stop all instance of all the app or have a way to lock the streams for maintenance (i.e. similar to sqldb migrations performed with flyway on a shared db-schema).

Documentation issue: For the headless mode to stay simple and easily operable, following would be helpful:

  • Documentation of the upgrade behavior in headless-mode drawing the difference with the interactive-mode (i.e. need for ksq-migrations tool), maybe in https://docs.ksqldb.io/en/latest/operate-and-deploy/how-it-works/
  • Support for a teardown.sql/cleanup.sql script such that deployment version+1 of the same app can automatically cleanup topic left from deployment version-1 of the app (see https://github.com/confluentinc/ksql/issues/1115).
  • Document the architectural difference between the 2 modes in https://docs.ksqldb.io/en/latest/operate-and-deploy/. Ideally it covers what topics/rockdb stays after stopping a ksql-server-cluster in both modes, rest-api access management vs no management, script lifecycle management, such that operability in production can be fully automated with zero-down time.

ccuz avatar Jun 09 '22 15:06 ccuz