data-api-builder Add Level 2 Caching

Why make this change?

Work for #2543 (Add Level 2 Caching).

What is this change?

First code pushed on this branch.

Summary of the changes:

updated package reference for FusionCache from 1.0.0 to 2.1.0
added package references to ZiggyCreatures.FusionCache.Serialization.SystemTextJson, Microsoft.Extensions.Caching.StackExchangeRedis and ZiggyCreatures.FusionCache.Backplane.StackExchangeRedis to support L2+backplane on Redis
added support for L2+backplane in the initial config (Program.cs)
added RuntimeCacheOptions on top of EntityCacheOptions: EntityCacheOptions was used to, I suppose, avoid having 2 similar classes since they had the same shape, but now that is not true anymore
added initial version of RuntimeCacheLevel2Options to contain the related config values
aligned some tests that broke because they were directly instantiating and passing EntityCacheOptions instead of the new RuntimeCacheOptions
added new support classes (eg: RuntimeCacheOptionsConverterFactory)
added CACHE_ENTRY_TOO_LARGE message instead of the previously used CACHE_KEY_TOO_LARGE one, which was used when the entry was too large (not the key)
changed the generic T in DabCacheService.GetOrSetAsync, from JsonElement (same name but a totally different thing than JsonElement class) to TResult (which was already used in the other GetOrSetAsync overload, to avoid confusion
fixed some typos (eg: "ommitted" instead of "omitted" etc)
added some temporary // TODO comments to get back to them later

Still missing:

although I already worked on the config generation side of things (eg: not just reading the config, but also generating it via the CLI), I think I haven't finished it yet
there are also some points where I'm not sure the convention I extrapolated is fully correct (eg: a config sub-object should be generated only if it's not null or should we do a comparison with the inner default values to see if we should skip it as a whole?)
since DAB is using FusionCache, it's also using Auto-Recovery: becaus of this, I'd like to tweak some of the default settings (eg: allow background distributed operations to make things faster, etc). But we'd have to first check if we want to expose some options to drive this or we go full-auto-mode

Not (necessarily) part of this effort, but already mentioned and to keep in mind:

avoid calculating each cache entry size, since SizeLimit is not currently used, nor it is possible to use it via some config (so, for now, it's useless)
start thinking about enabling some resiliency features like Fail-Safe, Soft Timeouts, etc
in the future it may be nice to introduce Tagging to automatically evict all the cached entries containing data about an entity after an update to said entity. It would be great

How was this tested?

Tests run locally:

[x] Unit Tests

Sample Request(s)

N/A

Mar 14 '25 22:03 jodydonetti

@microsoft-github-policy-service agree

Mar 14 '25 22:03 jodydonetti

I'll add that both L2 and the backplane are working, but I'd like to see the backplane in action on a real multi-instance DAB test: any advice on the best way to do this?

Also, I have to add the specific config part for the backplane, but I'm still thinking if it's really necessary or not: the "it just works" way can be to auto-use the backplane if the underlying L2 provider supports it (in our case: "redis"). But I'm pondering between ease of use of the default experience and total control...

Mar 14 '25 22:03 jodydonetti

/azp run

Mar 17 '25 00:03 Aniruddh25

Azure Pipelines successfully started running 6 pipeline(s).

Mar 17 '25 00:03 azure-pipelines[bot]

/azp run

Mar 18 '25 21:03 jodydonetti

Commenter does not have sufficient privileges for PR 2619 in repo Azure/data-api-builder

Mar 18 '25 21:03 azure-pipelines[bot]

After a couple of minor commits I think this is ready for a review.

Tests

If someone with the needed privileges (maybe @Aniruddh25 ?) can do an /azp run it should now pass all the tests.

Backplane Experience: It Just Works

About the backplane: my opinion is that in the future we may add the ability to specifcy a different provider for the backplane than for L2 (maybe with different options, like connection string, etc), but the default experience should be as much as possible an "it just works" way, based on the specific provider.

The only supported provider currently, Redis, has both abilities (L2 and backplane): because of this, I think we can be good like this for the first release, without any additional explicit option (and btw, I'm already sharing the same IConnectionMultiplexer instance between L2 and backplane, to use less connections).

Some Notes

A couple of additional notes:

although I already worked on the config generation side of things (eg: not just reading the config, but also generating it via the CLI), I'm not 100% sure it is already completely correct. Are there best practices for this?
there are also some points where I'm not sure the convention I extrapolated is fully correct (eg: a config sub-object should be generated only if it's not null or should we do a comparison with the inner default values to see if we should skip it as a whole?). Are there hints on this?

Pending

Still pending (but not related to this PR):

we should discuss about not calculating each cache entry size, since SizeLimit is not currently used, nor it is possible to use it via some config (so, for now, it's useless). Any opinion?

Please let me know, thanks!

Mar 18 '25 22:03 jodydonetti

/azp run

Mar 19 '25 21:03 Aniruddh25

Azure Pipelines successfully started running 6 pipeline(s).

Mar 19 '25 21:03 azure-pipelines[bot]

PS: although with this PR there would be full support for L2+backplane, as mentioned elsewhere it would be good to have some sort of "app name" or "app id", to use as a prefix or similar to support multiple DAB apps running over the same Redis instance. If it's complicated to come with such a new global config/concept, an alternative would be to have a new config in the top-level cache config node, something like "prefix" here:

{
	"$schema": "https://github.com/Azure/data-api-builder/releases/download/v0.10.23/dab.draft.schema.json",
	"data-source": {
		// ...
	},
	"runtime": {
		"rest": {
			// ...
		},
		"graphql": {
			// ...
		},
		"cache": {
			"enabled": true,
			"ttl-seconds": 10,
			"prefix": "blahblah", // HERE
			"level-2": {
				// ...
			}
		},
		// ...
	}
}

This would be cache-specific, and quite easy to add.

After the merge of this first PR (if all goes well) I can add this pretty fast.

Mar 19 '25 21:03 jodydonetti

/azp run

Mar 24 '25 18:03 aaronburtle

Azure Pipelines successfully started running 6 pipeline(s).

Mar 24 '25 18:03 azure-pipelines[bot]

Hi all, I'm back from the Summit: after our last chat I'm doing the necessary changes, will update later.

Mar 30 '25 14:03 jodydonetti

Proposal updated with the latest notes here.

Mar 30 '25 23:03 jodydonetti

Can someone /azp run please?

Apr 03 '25 20:04 jodydonetti

If someone can please /azp run I can check that everything is ok, then mark it as ready for review. Thanks.

Apr 05 '25 09:04 jodydonetti

/azp run

Apr 09 '25 15:04 aaronburtle

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

Apr 09 '25 15:04 azure-pipelines[bot]

/azp run

Apr 14 '25 17:04 aaronburtle

Azure Pipelines successfully started running 6 pipeline(s).

Apr 14 '25 17:04 azure-pipelines[bot]

/azp run

Apr 15 '25 22:04 aaronburtle

Azure Pipelines successfully started running 6 pipeline(s).

Apr 15 '25 22:04 azure-pipelines[bot]

/azp run

Apr 16 '25 15:04 aaronburtle

Azure Pipelines successfully started running 6 pipeline(s).

Apr 16 '25 15:04 azure-pipelines[bot]

config sub-object should be generated only if it's not null or should we do a comparison with the inner default values to see if we should skip it as a whole?

We should generate the sub-object only if its not null, otherwise use its default value in runtime.
Write to the file system only when it is provided as one of the options for the CLI commands. This avoids inflating the config with default values unnecessarily. The UserProvidedTtlOptions property is an example that serves this purpose.

You can have any more changes needed for this in a subsequent PR where you implement the CLI commands to specify L2 options.

May 10 '25 00:05 Aniruddh25

/azp run

May 10 '25 00:05 Aniruddh25

Azure Pipelines successfully started running 6 pipeline(s).

May 10 '25 00:05 azure-pipelines[bot]

Hi all, a question about the workflow to follow. I see some pending commits like this:

Am I the one who should commit them or will you do that? Just asking to avoid me waiting for you and you waiting for me 😅

May 10 '25 09:05 jodydonetti

Am I the one who should commit them or will you do that?

Hi @jodydonetti, good clarifying question. Those have been left as suggestions so its easier for "you" to accept the suggestion and commit. Although I took the liberty of resolving some merge conflicts, to let the pipelines run. Looks like a few Unit Tests are failing in snapshots. You just need to update the checked in snapshots with the new config values that mention the level 2 cache properties. If you have trouble fixing those, let us know and we can fix them. But for now, I will wait for you to fix the Unit Tests.

Some MS_SQL Integration tests that failed in HotReloadValidation as well - if they are not related to level 2 caching, please ignore them. I will rerun the pipeline - we are tracking some flaky failing tests in this issue: https://github.com/Azure/data-api-builder/issues/2010

May 11 '25 20:05 Aniruddh25

/azp run

May 31 '25 02:05 aaronburtle