bnd
bnd copied to clipboard
Performance: debugger source lookup becomes unusable
This is a kind-of follow-up issue to https://github.com/bndtools/bnd/issues/5322
I am currently using a bnd 6.4.0-SNAPSHOT from the p2 update site.
When I try to debug, the source lookup has become unbearably slow. Checking what happens, I found the checksum calculation in the BND plugins to be the culprit again. The screenshot below shows who is calling the aQute...Digestr.write() method - which is mainly the ResourceImpl.equals().
I really wonder... is my setup so extraordinary with a couple of hundred Maven modules and bundles? Am I doing something wrong?
I did an analysis. I pulled the latest changes from git to commit 5853f25a4146e8cc3331e6f4d7d79391021c3543 - and I added a log point in the deferred SHA256 calculation:
private final static Logger logger = LoggerFactory.getLogger(ResourceBuilder.class);
private final static AtomicInteger hashCalcCount = new AtomicInteger(0);
public boolean addFile(File file, URI uri) throws Exception {
....
DeferredValue<String> sha256 = new DeferredComparableValue<>(String.class,
asSupplierOrElse(() -> {
long start = System.currentTimeMillis();
int count = hashCalcCount.incrementAndGet();
logger.info("["+count+"] Calculating SHA256 on ["+file+"]...");
String hash = SHA256.digest(file).asHex();
logger.info("["+count+"] Calculating SHA256 on ["+file+"] took "+(System.currentTimeMillis()-start)+"ms");
return hash;
}, null),
deferredHashCode);
....
}
Then I build, installed the freshly built plugins into my Eclipse and opened a test.bndrun file in the BND Runfile Editor.
The log point was triggered way over 40k times and the file had still not opened in the BND runfile editor. I was only able to capture the first 23337 calls to the logpoint in a log file because I had to briefly leave then and when I came back the scroll buffer had already overflown... anyway...
So from 23337 times the log-point was triggered, here are the top-ten times a SHA256 digest was calculated for a given file. You see here the name of the file and how often during the opening of a single BND runfile the same hash was calculated over and over again. For most files, calculating the has only takes a couple of milliseconds - but I also have some large files for which it takes several seconds.
40 .../git/uima-uimaj/uimaj-adapter-vinci/target/uima-adapter-vinci.jar
40 .../git/uima-uimaj/uimaj-cpe/target/uima-cpe.jar
40 .../git/uima-uimaj/uimaj-tools/target/uima-tools.jar
41 .../.m2/repository/commons-io/commons-io/2.11.0/commons-io-2.11.0.jar
41 .../git/uima-uimafit/uimafit-core/target/uimafit-core-3.4.0-SNAPSHOT.jar
42 .../git/uima-uimaj/uimaj-core/target/uima-core.jar
43 .../.m2/repository/junit/junit/4.13.2/junit-4.13.2.jar
43 .../.m2/repository/org/slf4j/slf4j-api/1.7.36/slf4j-api-1.7.36.jar
44 .../.m2/repository/org/apache/commons/commons-lang3/3.12.0/commons-lang3-3.12.0.jar
44 .../.m2/repository/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar
@bjhargrave you wrote that even if the calculation is deferred now, it needs to be called at some point - ok, fine - but does it have to be called 40+ times on the same file even while just simple wanting to open a runfile in the editor?
I have built myself a locally modified version with a little cache for the SHA256 hashes. It expires hashes that are older than 30 minutes or if the file modification timestamp on disk differs from the one recorded in the cache. It also records misses, file updates and hits - here an example:
Calculated SHA256 on [MY-ARTIFACT-1.0.0-SNAPSHOT-tests.jar] took 6ms -- 1011 - 83524 - 0
- Cache contains 1011 items
- There were a total of 83524 cache hits
- None of the cached SHA256 hashes has been replaced because of the file on disk having a newer timestamp
(Those are the numbers for opening my bndrun file in the bnd runfile editor directly after starting Eclipse.)
Also, overall it feels quite a bit faster now... I still have to see if I notice any negative side effects of this cache.
You are using m2e which is using Bndtools m2e to handle the bndrun support. The bndrun support depends upon OSGi repositories which are made using FileSetRepository. The FileSetRepository is then a collection of ResourceImpls whose equals method depends upon the SHA256 value.
It is unclear to me from the information here how many times FileSetRepositories are being created. Each FSR will have distinct ResourceImpls which each need to compute the SHA256 value.
I would need a small github repo which demonstrates the problem to investigate further. Otherwise, I am just guessing on what the actual issue is an how to solve it.
Well you can assume that there are like a couple of hundred Eclipse projects in the workspace, each of which is an m2e module and each of which includes the bnd maven plugins to generate the OSGI metadata for the main artifact, for the test artifact, which run the bnd resolver maven plugin to update the test.bndrun file and which contain the bnd testing maven plugins to run tests (there is one test.bndrun file in each module). So from what you say, you could assume a couple of hundred FileSetRepositories I guess.
The m2e modules cluster into several groups that each inherit from a joint parent pom module. That parent pom usually contains a large dependency management section which manages all dependency versions for the entire reactor. The useDependencyManagement option is enabled in the bnd maven plugin exeuctions in all of the m2e modules.
The set of dependencies of all of these projects is probably the ~1000 JAR files that end up in my cache (see comment above) - so if two of the repositories refer to a ResouceImpl that has the same path, it should usually be the case that the file also has the same hash - so information about hashes calculated for files referenced from multiple repositories should be centrally sharable, no?
This is the makeshift static cache I hacked myself together for the moment - not very well done, but apparently effective...
private final static AtomicInteger missCount = new AtomicInteger(0);
private final static AtomicInteger hitCount = new AtomicInteger(0);
private final static AtomicInteger fupCount = new AtomicInteger(0);
private final static Map<String, CachedSha256> CACHE = new HashMap<>();
private class CachedSha256 {
final long timestamp;
final long calculated;
final String sha256;
CachedSha256(long timestamp, String sha256) {
this.timestamp = timestamp;
this.calculated = System.currentTimeMillis();
this.sha256 = sha256;
}
}
DeferredValue<String> sha256 = new DeferredComparableValue<>(String.class,
asSupplierOrElse(() -> {
String path = file.getPath();
CachedSha256 cachedHash;
synchronized (CACHE) {
long now = System.currentTimeMillis();
// Clean up every once in a while
int count = missCount.incrementAndGet();
if (count % 50000 == 0) {
CACHE.entrySet().removeIf(e -> (now - e.getValue().calculated) > (30*60*1000) );
}
cachedHash = CACHE.get(path);
long timestamp = file.lastModified();
if (cachedHash == null || timestamp > cachedHash.timestamp) {
if (cachedHash != null && timestamp > cachedHash.timestamp) {
fupCount.incrementAndGet();
}
String hash = SHA256.digest(file).asHex();
cachedHash = new CachedSha256(timestamp, hash);
CACHE.put(path, cachedHash);
}
else {
hitCount.incrementAndGet();
}
}
return cachedHash.sha256;
}, null),
deferredHashCode);
Looking at the code, I note that there is a small bug - the missCount is actually not counting misses but lookups. The real miss count would be hitCount - missCount...
I am not sure I see the value of the DeferredValue classes if we will have a cache of file->SHA values. Both are overkill. I would want to rip out the DeferredValue change if we move to a SHA cache. Then ResourceBuilder.addFile would itself consult a new SHA cache class instead directly using SHA256.digest.
I guess deferring the calculation until it is needed could still save effort in case the value is never needed at all. I did notice a performance improvement when you first put in deferral. But at the time, I still had fewer Eclipse projects converted to OSGi bundles. The more projects I migrated, the slower it appeared to become again.
It should be easy for you to test the removal of using the DeferredValue class in the SHA value calculation while keeping your SHA cache.
I think instead of removing it, I might add counters that count calls to addFile() and counts to the supplier of the deferred value. The difference would be the calls saved.
Btw. I tried writing log messages through an SLF4J logger in ResourceBuilder- but I was not able to see these messages in the Eclipse log, so I had to connect using a debugger and to use a logpoint. Can you tell me how to properly log messages from within the ResourceBuilder so that I can actually see them somewhere while this code is running in Eclipse?
If you start Eclipse from the command line, then you should be able to see the console output. We use logback. See https://github.com/bndtools/bnd/blob/5853f25a4146e8cc3331e6f4d7d79391021c3543/bndtools.core/bndtools.shared.bndrun#L11-L15 and https://github.com/bndtools/bnd/blob/5853f25a4146e8cc3331e6f4d7d79391021c3543/bndtools.core/bndtools.shared.bndrun#L26
https://github.com/bndtools/bnd/blob/5853f25a4146e8cc3331e6f4d7d79391021c3543/bndtools.core/logback.xml#L1-L15
So here some statistics - again, opening one of my test.bndrun files after a fresh start of Eclipse:
- cache size: 1003
- times addFile was called: 52294
- times an SHA256 for a file was requested: 51239
- updates to cached SHA256 due to file having a new modification timestamp: 0
- cache hits: 50235
- cache misses: 1004 (probably a off-by-one bug here - cache misses should probably be equal to the cache size)
That gives 1055 times that addFile was called but the SHA256 was not later needed. Assuming the calculation of the SHA256 takes maybe ~5-10ms that would sum to up like 5-10 seconds of saved time. Not much but also not nothing.
... I tried running Eclipse from the terminal and I do see some messages there, but not from the logger call I added to the cache code. So I used the debugger with the logpoint again ;)
You will probably need to edit the logback.xml to set the desired log level for your logger name.
1055 is only 2% of the times addFile was called. So that does not seem to be large enough to justify retaining the complexity of the DeferredValue code in addition to adding the complexity of a SHA cache.
Still, 10 seconds additional delay simply when trying to open a bndrun file in the editor is noticeable by the user... personally, I'd vote for keeping the deferral in.
You don't know it is 10 sec delay. You just pulled a number out of thin air :-)
Well, if I look at my logpoint output, I see lots of times in the 5ms-10ms range for calculating the hash sums. As I said before, I have some files that take several seconds for the hash to be calculated, so the true average is probably higher than 10ms. 1000 saved calls to SHA256 (the 2%) times 10ms = 10 seconds - not pulled out of thin are but rather an informed estimate.
I think that 2% of the total time spent computing SHA hash values is a more accurate time estimate since you don't know that the files whose SHA was not needed have expensive SHA calculations or not.
Here's a more accurate measurement. I took the first 1000 timings of SHA256 calculations from the run I logged ealier. These sum up to 17847 ms - that's ~17ms on average per file.
FYI here's some notes on logging in m2e/bndtools on eclipse https://github.com/bndtools/bnd/wiki/Eclipse-Logging-with-Logback
Here are the timings ;) batch of 1000, times in ms. The longest in this batch was 1039ms, but I have seen also longer calculations in the full run.
So if we assume 10ms per SHA256 calculation and we had 1004 cache misses in the above calculation, that would be 10 seconds with the deferral in place. If we removed the deferral, we'd get an additional 10 seconds - that would add up to 20 seconds in total - twice the time as without deferral.
Ok, what we do not know is if the calls saved by the deferral would have been cache hits - in that case, the deferral would be indeed pointless.
So, new numbers from my little real-world scenario here:
- 1010 - so many SHA256 for unique paths made it to the cache
- 1139 - for so many unique paths,
addFilewas called
That means, the SHA256 calculation for ~100 files was saved in this case by the deferral. Assuming a conservative 10ms per file, that would mean a slow-down by at least 1 second if the deferral would be removed for this scenario.
@reckart Try out the resource cache in PR #5372 to see if it addresses this performance issue for you.
@bjhargrave thanks :) I'm building and trying it.
Seems like the cache might not be used in all access paths to the SHA256.
When I try opening a .bndrun file after starting Eclipse, it takes very long with the PR code. In VisualVM, I cannot see the cache on the sampling trace either.

When I attach a debugger with a log point in SHA256.digest(File), I can see that after a short time the method has already been called 9 times for the same file.