emissary icon indicating copy to clipboard operation
emissary copied to clipboard

Switch to SBC in KFF (incl. HashPlace)

Open ukengineer925 opened this issue 2 years ago • 9 comments

Removes complexity within HashPlace/KffDataObjectHandler to just use the SBC path. Adds additional tests to confirm behaviour and functionality.

ukengineer925 avatar Dec 12 '22 16:12 ukengineer925

We are going to need to see the performance impact of this at full scale. My gut feeling is that using a SBC here will be wasteful and impact throughput.

cfkoehler avatar Dec 12 '22 22:12 cfkoehler

PR#204 ("Large data support") only calculated the hashes using the SeekableByteChannel. PR#204 spent months being tested on the development system with no observable performance impact by either myself or anyone else on the team. It was not until PR#362 ("KFFHashPlace should prefer byte[] to SeekableByteChannelFactory") that this behavior was changed to calculating hashes using the byte[]. Therefore, I believe the performance impact of exclusively calculating the hashes using the SeekableByteChannel has already been shown to be negligible.

jdcove2 avatar Dec 13 '22 16:12 jdcove2

@jdcove2 I hear your points but it has not been tested at full scale. You quote no observable performance differences. Do you have any data to back up those claims?

But as a side note. Emissary is a configuration-driven framework. Why should users of emissary be forced to use SBC if they don't want to?

I will defer to senior devs to make the decisions on this going forward.

cfkoehler avatar Dec 13 '22 17:12 cfkoehler

Given drivenflywheel's comment in PR#362 that "PRs work great for discussion, and they leave a much better historical record than a verbal discussion", it seems that this is the appropriate place to discuss your above concerns.

Therefore, I have the following questions...

  1. If testing on the non-production system is insufficient to be "tested at full scale", then what is sufficient?
  2. If BDO's can now hold data that is significantly larger than what a Java byte array can hold using a SeekableByteChannel, then why would we want to keep the byte array interface on the BDO knowing that it might return only a small subset of the actual data?

jdcove2 avatar Dec 13 '22 22:12 jdcove2

Thanks for the extra details Dave. As for the full scale I personally would have liked to see a statistical analysis of the performance on a full scale system. That can be interpolated from a smaller system with a different resource profile.

But it appears that the team is ok accepting this how it is. And not allowing for it to be configurable so I will accept that moving forward.

cfkoehler avatar Dec 14 '22 00:12 cfkoehler

From my questions above...

  1. It sounds like you have specific performance information you would like to see. I can work with you to produce this performance information. Based on what you said above, please let me know what metrics you want collected, what statistical analysis you want performed and what system you would like the metrics gathered on.

  2. In addition to what I said in my question 2 above, every data representation (e.g. byte array, SeekableByteChannel, InputStream, etc.) on the BDO will need to be supported by every place that uses BDO data. That is, if the BDO supports more than one data representation then each place would need to determine which data representation is being used and then process that data representation appropriately. This is why I understand the goal to be staying with one data representation by migrating from byte array to SeekableByteChannel. Eventually the byte array data methods will be removed from the BDO.

  3. The current code in this PR is configurable only because the BDO currently supports both byte array and SeekableByteChannel data representations during the transition from byte array to SeekableByteChannel. All code that uses byte arrays from the BDO will need to be converted to using SeekableByteChannels from the BDO. This PR addresses the code that manages hashes on the BDO and it has code for both data representations. Therefore, this PR is just eliminating the byte array code since the byte array data representation will be eventually removed from the BDO.

jdcove2 avatar Dec 14 '22 14:12 jdcove2

@ukengineer925 @jdcove2 I think we need more discussions on this approach. Maybe it will be discussed at our meeting on Friday. My main concern is the removal of being able to use KFF without sbc. We should not force all emissary users to use sbc. I know of use cases where even if you get a large file the hash from a truncated input is useful. Or we might have flavors of emissary that by design never get large files so why should they have to use sbc in places that don't need them.

This might require some rollback of previous work but a more acceptable and easier to integrate approach would be to have a new KFF hash place or handler that uses sbc. Then a running server can be configured to use one or the other. Or maybe in the world of testing or a unique use case use both. The framework is very configurable for places and routing. We should leverage it more.

And maybe something changes later in time and it is realized that the old non sbc KFF hash place is never used and not needed then it can go through the deprecation and removal process.

cfkoehler avatar Mar 21 '23 12:03 cfkoehler

@cfkoehler - I've added a convenience method to allow users to pass in a byte array of data if they really don't want to create their own SBCF (though as you'll see, there's a neat helper method to do that). I don't think maintaining two copies of KffDataObjectHandler, or two paths within one class, is optimal hence this PR.

ukengineer925 avatar Mar 21 '23 15:03 ukengineer925

@jdcove2 I think there are enhancements here worth bringing into the codebase without also changing default behavior to use SBC all the time. I'd like to explore moving those pieces in to gain that benefit.

jpdahlke avatar Sep 10 '24 15:09 jpdahlke