gaffer-docker icon indicating copy to clipboard operation
gaffer-docker copied to clipboard

Allow cache to be distributed / better persisted

Open d47853 opened this issue 3 years ago • 5 comments

The Gaffer cache containing named operations / federated store graphs should be distributed when using multiple Gaffer REST API instances and also so that you don't lose your graphs if updating the Federated Store.

Therefore we should consider either configuring the JCS cache or using the Hazelcast cache service.

Alternatively (and probably more preferably) Gaffer should consider storing this important data somewhere like a database rather than a cache.

I'll leave this as an open question so that we can discuss the options

d47853 avatar Feb 05 '21 09:02 d47853

@t92549 Does this question really belong in gaffer-docker? Also, it sounds like a gaffer issue that we have been looking at.

n3101 avatar Aug 11 '21 09:08 n3101

This kind of scenario with multiple Gaffer instances sharing a cache is described in https://github.com/gchq/Gaffer/issues/2457 so people are trying it. I think there is a PR associated with that issue to remove de-sync in federated stores with that set up. However, perhaps the solution of a database rather than a cache would be quite nice and remove a lot of the tricky issues with this set up. This would involve work in Gaffer, but gaffer-docker would also need some changes to ensure this deployment option is available and setup correctly. What do @sw96411 and @GCHQDev404 think?

t92549 avatar Aug 11 '21 09:08 t92549

In theory (ie. see gh-2457) the existing Gaffer Cache can be backed by a database, either via configuring and plugging the JCS cache implementation already used by gaffer (e.g. via https://commons.apache.org/proper/commons-jcs/JDBCDiskCache.html) or by plugging in your own implementation of uk.gov.gchq.gaffer.cache.ICacheService.

I've not done the former, but I have an example of code which does the latter - I'll share it with you personally @t92549 via a different mechanism. There's some specific environmental reasons I wouldn't simply cut-and-shut my example into the open-source project but it could absolutely act as a starting point.

If someone wants to experiment with using off-the-shelf JCS components and configuration to persist Gaffer caches (the federatedStores and NamedOperation caches in particular), then I'd be really interested in the outcome.

Slight aside: IMNSHO, gaffer is taking a bit of a liberty in calling some of it's "Caches"... well, "Caches" :-) Really, they are the primary stores of some of it's non-graph metadata.

One last note - I agree with @n3101 that this issue might be better in the regular gaffer project rather than gaffer-docker. Unless, perhaps, the ticket is solely scoped to using existing configuration and JCS components to implement a persistent cache in this docker implementation of Gaffer. That's just my idle thoughts, though - I just raise issues on these backlogs, I don't manage them :-)

sw96411 avatar Aug 11 '21 17:08 sw96411

In theory (ie. see gh-2457) the existing Gaffer Cache can be backed by a database, either via configuring and plugging the JCS cache implementation already used by gaffer (e.g. via https://commons.apache.org/proper/commons-jcs/JDBCDiskCache.html) or by plugging in your own implementation of uk.gov.gchq.gaffer.cache.ICacheService.

I've not done the former, but I have an example of code which does the latter - I'll share it with you personally @t92549 via a different mechanism. There's some specific environmental reasons I wouldn't simply cut-and-shut my example into the open-source project but it could absolutely act as a starting point.

If someone wants to experiment with using off-the-shelf JCS components and configuration to persist Gaffer caches (the federatedStores and NamedOperation caches in particular), then I'd be really interested in the outcome.

Okay excellent. I can take a look at the code you sent and also perhaps look into using these JCS features, get them tested and see if these would be useful to add as an example perhaps somewhere.

Slight aside: IMNSHO, gaffer is taking a bit of a liberty in calling some of it's "Caches"... well, "Caches" :-) Really, they are the primary stores of some of it's non-graph metadata.

Yeah I think that is the main reason we should start looking into this, especially as more people start looking into load balanced setups.

One last note - I agree with @n3101 that this issue might be better in the regular gaffer project rather than gaffer-docker. Unless, perhaps, the ticket is solely scoped to using existing configuration and JCS components to implement a persistent cache in this docker implementation of Gaffer. That's just my idle thoughts, though - I just raise issues on these backlogs, I don't manage them :-)

I suppose it depends on what the solution is. If we offer some sort of persistent cache like you sent to me then the code would live in Gaffer, but if we just want to set some config files as part of a deployment that can set up this cache without any code changes then it lives here.

t92549 avatar Aug 12 '21 10:08 t92549

https://github.com/gchq/Gaffer/issues/2457 is likely to stay because this will make the behaviour of cache between FederatedStore and other stores behave more similarly.

If you want to configure the docker to have magic buttons to have a configured persist cache somewhere like JSC I don't have strong opinions. Am I right in understanding Gaffer already has everything you need to do this?

GCHQDev404 avatar Aug 12 '21 11:08 GCHQDev404