eclipse.platform ResourcesPlugin: Multithreaded lazy start

Reads all projects in parallel. As this was done during ResourcePlugin.start() it had to be deferred as multithreaded classloading during BundleActivator#start(BundleContext) is not supported.

All that happens while splash screen still shown.

Feb 29 '24 17:02 jukzi

Test Results

198 files - 438 198 suites - 438 39m 12s :stopwatch: -42s 2 910 tests - 1 035 2 887 :white_check_mark: - 1 035 23 :zzz: ±0 0 :x: ±0 9 336 runs - 3 105 9 172 :white_check_mark: - 3 105 164 :zzz: ±0 0 :x: ±0

Results for commit 35c41696. ± Comparison against base commit adb84274.

This pull request removes 1035 tests.

AllTests AllCheatSheetTests AllCompositeTests TestCheatSheetManagerEvents ‑ testNoHandler
AllTests AllCheatSheetTests AllCompositeTests TestCheatSheetManagerEvents ‑ testOneHandler
AllTests AllCheatSheetTests AllCompositeTests TestCheatSheetManagerEvents ‑ testTwoHandlers
AllTests AllCheatSheetTests AllCompositeTests TestCompositeParser ‑ testBackwardDependency
AllTests AllCheatSheetTests AllCompositeTests TestCompositeParser ‑ testBadURL
AllTests AllCheatSheetTests AllCompositeTests TestCompositeParser ‑ testChoiceNoChild
AllTests AllCheatSheetTests AllCompositeTests TestCompositeParser ‑ testCircularDependency
AllTests AllCheatSheetTests AllCompositeTests TestCompositeParser ‑ testCompositeNoName
AllTests AllCheatSheetTests AllCompositeTests TestCompositeParser ‑ testDependency
AllTests AllCheatSheetTests AllCompositeTests TestCompositeParser ‑ testDependencyWithInvalidId
…

:recycle: This comment has been updated with latest results.

Feb 29 '24 18:02 github-actions[bot]

org.opentest4j.AssertionFailedError: 

Expecting actual:
  ["instance", "configuration", "user", "default"]
to contain exactly (and in same order):
  ["project", "instance", "configuration", "user", "default"]
but could not find the following elements:
  ["project"]

	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
	at org.eclipse.core.tests.internal.preferences.PreferencesServiceTest.testLookupOrder(PreferencesServiceTest.java:236)

Feb 29 '24 18:02 jukzi

I agree with Cristoph. The code expected to have workspace initialized after activator is done, not asynchronously after that.

Yes because of that any code that calls into resource plugin while the tracker.open is runs is not an indication of something is blocking, it means that the synchronization is actually effective, synchronize after that anything can happen (including it works, deadlock, get an exception) and you maybe just notice that randomly.

If one want to improve that there are some ways (and that was already performed by many components but still work in progress):

Do not call ResourcePlugin methods but pass the objects by reference, as in this special case of init is is allowed for internal code to "early access" these objects (carefully).
delay the access e.g do not call any code in static initializers of the class but on first access, this can be combined with (1)
decouple the code, e.g. by react to the workspace registered (one example is CheckMissingNaturesListener) and performing nay action the free of loader locks.
Create proxy object that themself only call when they are called the "real" object.
... maybe more ...

Mar 04 '24 16:03 laeubi

Yes because of that any code that calls into resource plugin while the tracker.open is runs is not an indication of something is blocking, it means that the synchronization is actually effective, synchronize after that anything can happen (including it works, deadlock, get an exception) and you maybe just notice that randomly.

It is not clear what you try say here. ||: Please give a concrete example how it can go wrong :||

Mar 04 '24 16:03 jukzi

To continue here, you could ensure that the parallel reading finishes before the Activator is left @jukzi. This would still read in parallel but keep the contract.

Mar 07 '24 11:03 vogella

To continue here, you could ensure that the parallel reading finishes before the Activator is left

I would love todo, and i had no plan to not so, but it turns out it is not possible, because using the resourceplugin classes in another thread before the activator finished can (and will very often) cause a deadlock. And i still see no reason why the workspace should have been fully read when plugin is started. Every method that can access that data will just wait on it and it works. More over the behaviour of ResourcesPlulugin.start() to take that long not only violates the contract of org.osgi.framework.BundleActivator.start(BundleContext) This method must complete and return to its caller in a timely manner. but actually the Classloader does already detects that misbehavior with a timeout and does already skip to further wait on it. So in fact the current implementation is broken to not wait on loading finished, while this PR fixes that.

Mar 07 '24 11:03 jukzi

So in fact the current implementation is broken to not wait on loading finished, while this PR fixes that.

This PR does not fix that it simply breaks out the classloader synchronization, another thread is not allowed to load additional classes (because that's what lazy-activation is ensuring) that does not mean that the current thread is not allowed to offload some work to other threads.

And that's what I have already written here you can:

perform some work (e.g. I/O) in parallel threads from within the activator
refactor the caller of any static ResourcesPlugin#getXXXXX to something that either uses DI (e.g. SCR with reference to IWorkspaceService) or by passing it as a parameter instead.
do not load the data on startup but when it is actually accessed (e.g. markers can be loaded once code actually want to access the first marker)

both things will let you escape out of your classlaoderlock,improves the code and do not require to break the lazy-activation.

Mar 07 '24 12:03 laeubi

BTW the timeout (after 5 or 30sec) looks like

org.osgi.framework.BundleException: Unable to acquire the state change lock for the module: osgi.identity; type="osgi.bundle"; version:Version="3.20.200.qualifier"; osgi.identity="org.eclipse.core.resources"; singleton:="true" [id=110] STARTED [STARTED]
	at org.eclipse.osgi.container.Module.lockStateChange(Module.java:373)

Can be found for other plugins on stackoverflow aswell and the solution there was for example:

https://git.eclipse.org/r/c/sourceediting/webtools.sourceediting/+/154548/2/core/bundles/org.eclipse.wst.sse.core/src/org/eclipse/wst/sse/core/internal/SSECorePlugin.java

i.e starting a asynchronus thread in the BundleActivator.start() and now guess who create that commit? So @iloveeclipse please explain why you think it is vaild there but not here.

Mar 07 '24 13:03 jukzi

please explain why you think it is vaild there but not here.

One should not compare apples with oranges.

Here we are talking about code contract that is as old as Eclipse is, in a bundle that one of the basic building blocks of the SDK with lot of consumers that relied on the existing behavior, implicitly or explicitly.

Trying to gain some (not even quantified) startup improvement by ignoring possible compatibility break is not an option.

Mar 07 '24 13:03 iloveeclipse

@jukzi As mentioned before you can fork a job for any loading in any component (if it is guaranteed that before the component access the results is joining on the job) and for example CheckMissingNaturesListener does exactly that, but here you do fork the whole workspace init into an arbitrary thread and because of that sync on arbitrary other threads.

Ans especially the case where lazy-activation thru calls to these methods happens because on classlaoding is a special case of activation, that is even problematic because it could result in "activation rush", but this is not to be fixed on the Activator side, it must be fixed on the Caller site!

So again, if any code calls ResourcePlugin.getXXX it is expected that before it can do so (that is where the clasloading starts) the activator is first run, and if an instance location is available it is initialized before the classlaoding call returns, that is because otherwise if the instance area is NOT set this is an error and must throw an exception. If one does not throw an exception it could happen that there are deadlocks of threads or even UI lockdowns, because of that it is discouraged to call the static methods.

Mar 07 '24 14:03 laeubi

Here we are talking about code contract that is as old as Eclipse is

The instanceLocationTracker was introduced just 2 year ago. I Ionly added the the async initialisation thread to make that still work even though barely nobody uses it but only the static method are used. For example Eclipse IDE and all its tests work without it, because the Resourceplugin is started in the UI thread as one of the first plugins anyway.

possible compatibility break

So what exactly do you think is breaking here?

Mar 07 '24 14:03 jukzi

Resourceplugin is started in the UI thread

More exactly it is implicitly started in IDEApplication.start(IApplicationContext) as the IDEWorkbenchAdvisor uses IContainer from org.eclipse.core.resources

Mar 07 '24 14:03 jukzi

The instanceLocationTracker was introduced just 2 year ago.

Please look how the code was before it simply FAILED, but it never did anything async... thats the point and as said you have two options:

static access -> slow, discouraged, will run into classlaoder lock
"async" / service oriented, will not lock

More exactly it is implicitly started in IDEApplication.start(IApplicationContext) as the IDEWorkbenchAdvisor uses IContainer from org.eclipse.core.resources

that's not true, this is one possible entry point it could be started earlier, e.g. if there is a Instance Area saved/set already set before and a bundle wants a workspace it will be started before the IDE shows up.

Mar 07 '24 14:03 laeubi