exist icon indicating copy to clipboard operation
exist copied to clipboard

Adds in memory cache for RPC query results

Open reinhapa opened this issue 1 year ago • 11 comments

Enables in-memory query result values less than 4 MB instead of writing them to a temporary file.

By using the in-memory-buffer no potential virus scanner is trigger either on the server as on the client side. This almost doubled the performance on our productive systems, when using this change.

reinhapa avatar Jan 10 '24 21:01 reinhapa

This sounds interesting and like you are seeing a good performance improvement. If I understand correctly, when the result of an RPC call is less than 4 MB in size, then instead of writing to a file, you keep it in memory for a finite period of time. Do I have that correct?

Correct.

If that is the case, then in general that sounds fine, but I have some questions which I can't spot easily in your code (perhaps you can point me to specific parts of your PR):

  1. Presumably the data coming from the result of RPC call is in one of two forms - either (a) an InputStream that is read directly from the network socket, or (b) a buffer that is already in-memory?

If its scenario (a), how do you know the size without reading all of the data? or do you just read up to 4 MB, and if it is 4MB or more, you flush the already read 4 MB to a file and then keep appending to the file until the stream is completely read?

Absolutely correct. This is done within the existing VirtualTempPath implementation introduced earlier on the client side within AbstractRemoteResource, RemoteResourceSet, RemoteXMLResource and InternalXMLSerializer

If its scenario (b), are you avoiding making a second copy of the same data in-memory?

  1. How are you bounding the total size of the cache? i.e. How do I make sure I don't have lots of 4 MB chunks of memory in-flight that ultimately add up to a large number and cause memory starvation?

At the moment, there is no such handling for permitting such logic that may be added later...

The current MemoryContentsImpl class holding the actual data stores those in 4k blocks within a two dimensional array instead of using one big 4MB byte array.

reinhapa avatar Jan 19 '24 18:01 reinhapa

Absolutely correct. This is done within the existing VirtualTempPath implementation introduced earlier on the client side

I think you are referring specifically to the OverflowToDiskStream class, is that correct?

At the moment, there is no such handling for permitting such logic that may be added later...

Just my opinion, but if we are going to Cache things in memory we should have an upper bound on the total size of that cache otherwise it will lead to problems sooner or later.

The current MemoryContentsImpl class holding the actual data stores those in 4k blocks within a two dimensional array instead of using one big 4MB byte array.

An interesting design! One concern, with that sizing, that is potentially a lot of small allocations, and as these buffers (as I think I understand it) are not from a pool, it will also mean a lot of GC.

eXist-db does have bounded caches already based on Caffeine, and also has some Pooling based on custom mechanisms such as the simplistic ByteArrayPool or the more complete Apache Commons Pool.

adamretter avatar Jan 19 '24 18:01 adamretter

I think you are referring specifically to the OverflowToDiskStream class, is that correct?

Yes, that's exactly the implementation...

Just my opinion, but if we are going to Cache things in memory we should have an upper bound on the total size of that cache otherwise it will lead to problems sooner or later.

That is correct, I do not know how much the RPC option is actually used and more enhancements in that diection are shure an option.

An interesting design! One concern, with that sizing, that is potentially a lot of small allocations, and as these buffers (as I think I understand it) are not from a pool, it will also mean a lot of GC.

In case of pooling the already used memory instances would be beneficial in that regard I guess...

eXist-db does have bounded caches already based on Caffeine, and also has some Pooling based on custom mechanisms such as the simplistic ByteArrayPool or the more complete Apache Commons Pool.

I can look into this a bit more in the next days. The question is where to configure and monitor all those parts... Do monitor exising caches that you mentioned?

reinhapa avatar Jan 19 '24 19:01 reinhapa

I do not know how much the RPC option is actually used

Hmm... My awareness is that it is used by Oxygen XML Editor, The Java Admin Client, and xst. However they are all pretty much single-user uses. So not bounding the cache there won't make much difference.

But I guess like you, others are using it for integration - if I recall correctly, I think the Python libraries for eXist-db use the XML-RPC interface (as of course does the XML:DB interface that you wrote/updated).

I can look into this a bit more in the next days. The question is where to configure and monitor all those parts... Do monitor exising caches that you mentioned?

For monitoring - some of it is exposed via JMX (particularly Caffeine and some of the Apache Commons pools). For sizing and configuration of this stuff, I think its normally done from conf.xml at startup.

adamretter avatar Jan 19 '24 20:01 adamretter

For monitoring - some of it is exposed via JMX (particularly Caffeine and some of the Apache Commons pools). For sizing and configuration of this stuff, I think its normally done from conf.xml at startup.

@adamretter do you got any suggestions, where there is the best place to configure that to be created ContentFilePool within the conf.xml?

Also what would be the best configuration properties? I was thinking about the pool size and the VirtualTempPath maximum in memory size...

reinhapa avatar Jan 20 '24 18:01 reinhapa

do you got any suggestions, where there is the best place to configure that to be created ContentFilePool within the conf.xml?

Hmm... I am not certain at the moment I don't think there are any API specific settings in the conf.xml file. The closest thing in concept is perhaps the binary-manager section, so perhaps a new section that is similar to that? or perhaps conf.xml is not even the right place for this?

adamretter avatar Jan 20 '24 18:01 adamretter

Hmm... I am not certain at the moment I don't think there are any API specific settings in the conf.xml file. The closest thing in concept is perhaps the binary-manager section, so perhaps a new section that is similar to that? or perhaps conf.xml is not even the right place for this?

At least for the server part I think conf.xml seems to be the right place for me. The question is if we would need a new section as it is used within the RPC servlet only...

@dizzzz @wolfgangmm @line-o any other ideas on this?

reinhapa avatar Jan 20 '24 19:01 reinhapa

I created a separate PR #5195 adding a ContentFile pooling mechanism. It still has to be decided, where the configuration actually needs to go.

reinhapa avatar Jan 23 '24 19:01 reinhapa

@adamretter I've integrated the pooling for those ContentFile instances now...

reinhapa avatar Feb 08 '24 06:02 reinhapa

@adamretter I addressed all the changes you mentioned. Could you re-review it and approve in order to merge?

reinhapa avatar Feb 19 '24 17:02 reinhapa

Hi @adamretter

  • Some variables/params could be marked final please

Most places where already that way before my change but should now be changed.

  • Empty collections are maybe better created with Collections.emptyXXX to make intention clear.

I've changed those accordingly

  • Not sure why the change from UnsynchronizedByteArrayOutputStream to the less performant ByteArrayOutputStream. I spent time previously chasing out ByteArrayOutput/InputStream from the code base.

That was done because of the deprecation warning... I reverted that change, we might need to fix those warnings by other means...

reinhapa avatar Mar 13 '24 11:03 reinhapa

Quality Gate Failed Quality Gate failed

Failed conditions
73.7% Coverage on New Code (required ≥ 80%)

See analysis details on SonarCloud

sonarqubecloud[bot] avatar Mar 13 '24 12:03 sonarqubecloud[bot]