Cluster rollout strategy

Open novoj opened this issue 2 years ago • 1 comments

We need to think better about new applications using evitaDB incremental installations on separate nodes of the cluster. The most common scenario will be that the new schema needs to be applied to the node when the new version of the application is launched, but other dozens of nodes may still be running the old version of the application. We should help developers as much as possible to simplify this complex scenario.

Problematic spots

REST / GraphQL query containers with hashes

Certain query containers in the Swagger/GraphQL schema are generated with a hash in their name. The hash is generated from a key that changes when:

the key things in the schema are changed
1. the name of the entity collection - e.g. Product to Products
2. reference name - e.g. categories to assignedCategories
key things in the construction of the condition are changed
1. the specific interface (interface) of the conditions that can be used as children of the condition
2. a change in the forbidden types of child conditions
3. change in explicit permission of child conditions (explicit permission is an exceptional thing, only for one or two types of conditions)

Adding a new non-specific condition that can be used anywhere does not change the hash.

However, there are situations where the hash can change. Changing the hash means that the clients will not be able to compile, and they will have to be updated to newly generated stubs. The runtime may also fail if the type with the hash is used as a [variable] (https://graphql.org/learn/queries/#variables).

Renaming / Removing Named Entities

If the entity collection/attribute/reference is renamed or removed, the old clients will start to fail because they'll be referencing the old names until they're updated. The data types and properties of the attributes also affect the clients and may prevent them from filtering the data.

Upgrading evitaDB engine version

We also need to upgrade the evitaDB engine, and the next version may have backward incompatible binary format or changes in query construction.

Possible mitigation of the problems

Schema evolution, engine stays the same

We may introduce a new feature to evitaDB called preservation points. Creating a preservation point will be possible via EvitaSessionContract / CatalogContract and will mark a specific version of the schema as "preserved". This version of the schema and associated GraphQL / REST schema will be maintained until the preservation point is removed (via the API or automatically). The preservation points will be accessible on separate endpoints:

https://server.com:5555/gql/evita/preserved_schema_version/

The schema version can be compiled in the application during build, so two new and old versions will use different "versions" valid at the time they're built.

evitaDB will need to have a special translation policy that allows queries/mutations to be translated from the preserved version to a current one. It would also need to preserve and update old indexes that would have been removed because the properties of the attributes changed or the attribute was removed. There will also be hard issues that need to be resolved - and the system will eventually become "consistent" when there are multiple clients using different preservation points.

Hard scenarios

non-null attribute is removed - new clients don't see the attribute and don't send it, old clients see it and require it, but data created by new clients will lack this attribute - this may cause old clients to fail.
- The same goes for reference or attribute on reference.
- The same goes for changing the attribute to a different data type.
non-null attribute is created - old clients will not provide it
attribute is changed from filterable to non-filterable - we still need to keep the old index up to date to support the old clients

... and more. This is a deep rabbit hole. Some of the problems can be solved using a clever "default values" for new/old unknown attributes and refuse changes to a schema, that would be backward incompatible.

evitaDB engine upgrade

When upgrading evitaDB, there is no other solution than to run both old and new evitaDB versions at the same time. It should be possible to set the old evitaDB MASTER node to a mode that forwards all mutations to the MASTER node of the new evitaDB engine and waits for the confirmation, when it arrives it will add the same mutation to a local database. The mutation will have to undergo a transformation that will allow it to be processed by a new engine version.

The upgrade procedure would look like this:

new evitaDB server is deployed and restored from backup of the old evitaDB
new evitaDB server is instructed to get all WAL changes to the old evitaDB from the last transaction id in the backup to be "in-sync"
old evitaDB server is instructed to forward all WAL mutations to new server prior to accept them locally - the new instance is the MAIN now
when all clients are upgraded and start communicate with the new version of evitaDB the old evitaDB is stopped and deleted

Jun 08 '23 10:06 novoj

See also proxy example for Armeria server: https://github.com/line/armeria/blob/main/examples/proxy-server/src/main/java/example/armeria/proxy/ProxyService.java

Sep 05 '24 13:09 novoj