eclipse.platform.swt TreeItem.getItemCount() and TreeItem.getItem(int) have linear complexity on Linux

trafficstars

The org.eclipse.swt.widgets.Tree is slow on Linux. In particular TreeItem.getItemCount() and TreeItem.getItem(int) have linear complexity (the time of execution is proportional to the number of children).

To Reproduce

Run JUnit test or following snippet:

import org.eclipse.swt.SWT;
import org.eclipse.swt.layout.FillLayout;
import org.eclipse.swt.widgets.Shell;
import org.eclipse.swt.widgets.Tree;
import org.eclipse.swt.widgets.TreeItem;

public class TreeTest {

	private Shell shell = new Shell();
	{
		shell.setLayout(new FillLayout());
		shell.open();
	}


	public void getItemShouldHaveConstantTime() {
		for (int i = 0; i < 1000; i++) {
			measureGetItemTime(100); // warmup
		}
		double elapsed_10 = measureGetItemTime(10);
		double elapsed_100000 = measureGetItemTime(100000);
		double ratio = elapsed_100000 / elapsed_10;

		System.out.printf("Time for 10 elements: %f ns\nTime for 100000 elements: %f ns\nRatio: %f\n",elapsed_10, elapsed_100000, ratio);
	}

	private long measureGetItemTime(int count) {
		Tree tree = createTree();
		try {
			TreeItem subject = new TreeItem(tree, SWT.NONE);

			createChildren(subject, count);
			readAndDispatch();

			// warmup
			for (int i = 0; i < 1000; i++) {
				System.nanoTime();
				subject.getItem(count - 1);
			}

			long start = System.nanoTime();
			for (int i = 0; i < 10000; i++) {
				subject.getItem(count - 1);
			}
			long stop = System.nanoTime();
			long elapsed = stop-start;
			return elapsed;
		} finally {
			tree.dispose();
		}
	}

	private void createChildren(TreeItem subject, int count) {
		for (int i = 0; i < count; i++) {
			new TreeItem(subject, SWT.NONE);
		}
	}

	private Tree createTree() {
		Tree result = new Tree(shell, SWT.NONE);
		return result;
	}

	private void readAndDispatch() {
		for (int i = 0; i < 10; i++ ) {
			while(shell.getDisplay().readAndDispatch()) {}
		}
	}


}

This test wrapped in a product: testporduct.zip (sources) slowGetItem.linux.gtk.x86_64.zip slowGetItem.linux.gtk.aarch64.zip (built product)

Expected behavior Access time should not depend on the number of items.

Environment:

Select the platform(s) on which the behavior is seen:

- [ ] All OS
- [ ] Windows
- [x] Linux
- [ ] macOS

Any GTK
Any JRE

Version since Was there always?

Workaround (or) Additional context Initially reported in https://github.com/eclipse-platform/eclipse.platform.ui/issues/649 Jface algorithms make navigation time proportional to a square of child count due to this problem.

Workaround: https://github.com/eclipse-platform/eclipse.platform.ui/pull/810

GTK returns children count in O(N):

static gint
gtk_tree_store_iter_n_children (GtkTreeModel *tree_model,
				GtkTreeIter  *iter)
{
  GNode *node;
  gint i = 0;

  g_return_val_if_fail (iter == NULL || iter->user_data != NULL, 0);

  if (iter == NULL)
    node = G_NODE (GTK_TREE_STORE (tree_model)->priv->root)->children;
  else
    node = G_NODE (iter->user_data)->children;

  while (node)
    {
      i++;
      node = node->next;
    }

  return i;
}

Nov 11 '23 10:11 basilevs

I have not found a way to make TreeItem.getItemCount() without using this GTK API, but I have an idea of caching child count to help with JFace scenario, where child count does not change, but each child is accessed in sequence. Unfortunately, cache invalidation seems to quite complex with this approach.

Nov 11 '23 10:11 basilevs

Just wondering, why do we need to query gtk at all? should not SWT/JFace be the one creating items and keep track of the count?

Nov 11 '23 14:11 laeubi

Indeed GTK gtk_tree_store_iter_n_children() iterates a list of items to count children: https://github.com/GNOME/gtk/blob/3.24.38/gtk/gtktreestore.c#L766-L771

Note that on Windows, SWT caches number of items: https://github.com/eclipse-platform/eclipse.platform.swt/blob/eafdfbdfd7a4ee62a6bdb694be97f9e13abade09/bundles/org.eclipse.swt/Eclipse%20SWT/win32/org/eclipse/swt/widgets/Tree.java#L3340-L3348

Nov 11 '23 21:11 SyntevoAlex

Naive caching helps with childCount(), but getChilld(int) is ~ (3N), and when the fix is applied it stays O(N). See my PR comment.

Nov 11 '23 21:11 basilevs

@basilevs what I mean is that SWT itself is creating native items, so how could it be it does not knows (or could know) the number of items or the n-th item... so if the native implementation is to slow we probably should store an array of items on the java side that allows fast retrieval of childCount() and getChilld(int)

Nov 12 '23 06:11 laeubi

We don't need a whole copy of the model:

Virtual only shows portions of the model that are visible
One only needs to store the pointers to the natives to get a fast count/getX by not iterate the whole model

Nov 12 '23 06:11 laeubi

@laeubi as there no way to detect visible object going out of scope, it is impossible to clean the cache, which means we would store all the objects that had ever been shown indefinetly, which is not much better than whole model (indeed in context of Jface, this cache would have ALL items ever returned by content provider). 2. Storing pointers is effectively same as storing references to Java objects, given GTK implementation stores a cache of all(?) Java objects and has a way to transform pointers to references. In other words, given reference size, there is no benefit to store pointers instead of references - either way a cache of all items ever shown will form a copy of a model. (To clarify, I do not talk about model from JFace, I'm talking about model that native code holds, maybe I should use the term viewmodel).

Nov 12 '23 07:11 basilevs

We can avoid cache cleanup problem by only holding children of one item of interest, and repopulating it when interest shifts (Windows implementation does it for its caches).

Nov 12 '23 07:11 basilevs

@laeubi I've implemented the cache and fixed the quadratic complexity of traversal, but the solution (in #883) stinks (is very slow on depth-first traversals). I'm still thinking about either making TreeItem volatile or implementing your suggestion of full model mirror. Both are disruptive.

Nov 12 '23 09:11 basilevs

I was wrong - efemeral Java TreeItems were never an option - strong references to them are held on all platforms. The difference is : MacOS uses those strong references to keep track of parent-child relationship, while Linux and Windows create a convoluted system of identifiers to keep track of parents and children.

Nov 12 '23 10:11 basilevs

@basilevs as slow Tree performance is a major concern I think it would be great to improve it even if it might be disruptive, if you can prepare things we can merge as soon as master is open and have a full cycle to test so don't worry as long as all tests pass :+1:

Nov 12 '23 10:11 laeubi

I went with my dumb caching approach and it works for the scenario I'm interested in. It fails to actually make children tracking sensible, in particular, depth-first traversal is slow.

I suggest not to close this issue until someone volunteers to rework SWT GTK to actually make getItem() constant time.

Nov 13 '23 12:11 basilevs

@SyntevoAlex : any chance you could look at #883 & remark there regarding the "complete" solution?

Nov 13 '23 12:11 iloveeclipse

I could have a look on wednesday, if I don't forget.

Nov 13 '23 20:11 SyntevoAlex

The PR #883 is a failure. I've tested it on original testcase from https://github.com/eclipse-platform/eclipse.platform.ui/issues/649 and the polynomial growth is 1.8 (approximation from three points) which for practical purposes is not different from the original 2.0. I give up for now. Sorry for all the noise!

Nov 14 '23 10:11 basilevs

An example of ID use case:

org.eclipse.swt.widgets.Tree.getFocusItem() {
	long [] path = new long [1];
	GTK.gtk_tree_view_get_cursor (handle, path, null);
	if (path [0] == 0) return null;
	TreeItem item = null;
	long iter = OS.g_malloc (GTK.GtkTreeIter_sizeof ());
	if (GTK.gtk_tree_model_get_iter (modelHandle, iter, path [0])) {
		int [] index = new int [1];
		GTK.gtk_tree_model_get (modelHandle, iter, ID_COLUMN, index, -1);
		if (index [0] != -1) item = items [index [0]]; //TODO should we be creating this item when index is -1?
	}
	OS.g_free (iter);
	GTK.gtk_tree_path_free (path [0]);
	return item;
}

Here GTK returns an iterator and we need a TreeItem it corresponds to. As iterator is not a key, Tree has to use additional user data stored natively. Same goes for event handler like org.eclipse.swt.widgets.Tree.cellDataProc(long, long, long, long, long) where items are addressed with iterators.

Nov 18 '23 15:11 basilevs

Lazy Tree reveal is 340 times faster on the prototype branch for 100000 elements in the tree. The reval operation execution time is linear over the tree size. Other operations are also significantly quicker.

Master branch:

213 ms to process 10000 items
218 ms to process 100000 items
The time to select scales as O(n^0.01) as the size of the model grows from 10000 to 100000
1249 ms to process 10000 items
93684 ms to process 100000 items
The time to reveal scales as O(n^1.87) as the size of the model grows from 10000 to 100000

The prototype branch:

(SWT:74185): Pango-WARNING **: 18:03:26.227: Invalid UTF-8 string passed to pango_layout_set_text()
224 ms to process 10000 items
238 ms to process 100000 items
The time to select scales as O(n^0.02) as the size of the model grows from 10000 to 100000

(SWT:74185): Pango-WARNING **: 18:03:28.701: Invalid UTF-8 string passed to pango_layout_set_text()

(SWT:74185): Pango-WARNING **: 18:03:28.701: Invalid UTF-8 string passed to pango_layout_set_text()
39 ms to process 10000 items
240 ms to process 100000 items
The time to reveal scales as O(n^0.78) as the size of the model grows from 10000 to 100000

Main factors

GTK.gtk_tree_model_set was used to associate TreeItem with GTK table row with a custom "ID" column and to proactively update GTK rows on setText(), setForeground(), setBackground(), setImage(), setFont()
GTK.gtk_tree_model_set is linear over tree size
JFace eagerly traverses elements of interest and their siblings on some operations (reveal is especially surprising in lazy trees, as one can't expect eager traversal of LazyContentProvider).

Solution

Do not recompute GTK iterators without a need, use iterators instead of indexes where possible
Replace system of "ID" identifiers with path locators. Note, path locators take again linear time over tree size to locate an element, but they take no time at all to build, reducing eager tree building to linear execution time (from quadratic).
Move all update procedures from setText(), setForeground(), setBackground(), setImage(), setFont() to cellDataProc(), simlarly to how VIRTUAL tree operates, but instead of data updates, do GTK updates lazily this prevents the need to do linear operations eagelry. Doing them only for visible items again reduces item update from a linear to constant execution time, resulting for linear (instead of quadratic) execution time for full tree refresh.

The prototype branch is https://github.com/eclipse-platform/eclipse.platform.swt/compare/master...basilevs:eclipse.platform.swt:issue_882_gtk_direct_access

It has bugs and is large and inconvenient to review. I plan to split it into smaller changesets and try to fix bugs and style. Before I proceed with this refinement however, I'd like to confirm with @SyntevoAlex if such a fundamental change of how SWT GTK Tree operates has a chance to be accepted.

Thanks for your support and sorry for the noise.

Nov 23 '23 18:11 basilevs

@basilevs that sounds really promising, I think if you can extract some smaller changes that are easier to review I don't see a reason why such performance improvements can't be accepted.

Branch is currently closed, so its a good time to prepare such PRs already and then get them merged as early as possible.

Nov 24 '23 06:11 laeubi

@laeubi @SyntevoAlex could you create a temporary branch for my changes, so that I could target my PRs without disrupting the master? In particular, I'd like to start with a set of failing tests.

Nov 24 '23 14:11 basilevs

@basilevs what exactly do you want to archive with a temporary branch? If you like you can enable Github actions on your fork and then create any branches/prs there and they will build with these.

Another strategy is that you e.g. create branchX and open a PR for that, then you create branchY from branchX for some more experiments and so on.

Nov 24 '23 15:11 laeubi

I want performance test results to be a part of the multi-stage review process.

I was not aware I could have my own actions. I was under impression they use environment secrets. Thanks for the hint.

I do not understand second strategy. Do you suggest I create branches in my fork and invite reviewers there to review parts of the work? This seems strange to me as once my repository is eventually removed, the important review history would be lost.

Nov 24 '23 15:11 basilevs

Maybe you can explain why a normal branch/PR does not work? Are performance tests not executed there?

Nov 24 '23 15:11 laeubi

@laeubi merging first PR with failing tests in the master branch will fail build for everyone.

Nov 24 '23 15:11 basilevs

@basilevs but why do you need to merge, you can just

add more commits to the branch
branch from that branch if you want to do further experiements

Nov 24 '23 15:11 laeubi

I need to merge the first stage of change so that second stage can be reviewed independently with clear source delta and easily comparable test results.

As the process is so hard to convey, lets stop its discussion for now. I will have to disable failing tests on every stage of the work and it will be harder to impress reviewer without visible improvements to test results, but as long as everything is merged, it does not matter.

Nov 24 '23 15:11 basilevs

I need to merge the first stage of change so that second stage can be reviewed independently with clear source delta and easily comparable test results.

That's why you can simply use different commit for that in github you can then select that commit to only see that changes.

So if we create a temporary branch here or in your repo does not make any real difference. So maybe just try that approach and let us know when there is a problem and I'm sure we can help.

Nov 24 '23 16:11 laeubi

Glancing at gtk_tree_store_get_value() I do not observe linear complexity.

It seems to derefence iterator and obtain the Tree's "row" in O(1). https://github.com/GNOME/gtk/blob/3.24.38/gtk/gtktreestore.c#L658

Can you please clarify?

Nov 25 '23 17:11 SyntevoAlex

Glancing at gtk_tree_store_get_value() I do not observe linear complexity. It seems to derefence iterator and obtain the Tree's "row" in O(1). https://github.com/GNOME/gtk/blob/3.24.38/gtk/gtktreestore.c#L658 Can you please clarify?

This is an experimental result - I've profiled tree traversal and found that ID setting is a slowest part of traversal. Once tree traversal was sped up by removing getID(long, boolean) method, I've noticed that tests are still slow due to TreeItem.setText(), TreeItem.setForeground(), etc. and traversal with such calls is one polynomial degree slower than without. I did not find the root cause in C code, but assumed it was hidden in either renderers, JNI, or C allocator.

I've run the tests again and see that problem is not in getForeground() but in setForeground(), as I mostly focused on setters during my testing, I've mixed them up in the previous message.

Please see the test https://github.com/basilevs/eclipse.platform.swt/blob/3d7b9bedc5ad082e35b0db24835c533535e601ed/tests/org.eclipse.swt.tests/JUnit%20Tests/org/eclipse/swt/tests/junit/performance/Test_org_eclipse_swt_widgets_Tree.java

It's setForeground() and secondTraverse() methods demonstrate how execution time of a single setForeground() call depends on the size of dataset:

secondTraverse[Shape: STAR, virtual: false]
Execution time should grow as 1.200000 degree polynom. 
Time for 10000 elements: 4221486.000000 ns
Time for 100000 elements: 18886615.000000 ns
Ratio: 0.447393
Degree: 0.650689

setForeground[Shape: STAR, virtual: false]
Execution time should grow as 1.300000 degree polynom. 
Time for 10000 elements: 427697692.000000 ns
Time for 100000 elements: 33867289090.000000 ns
Ratio: 7.918511
Degree: 1.898644

Traversal of a prepopulated tree is linear, while same traversal with a setForeground() call is quadratic. These test are done on a master branch (the branch I linked to is a prototype, so copy the test if you want to run it on master).

Nov 25 '23 21:11 basilevs

Please use Java profiler on your test on original SWT code to identify what exactly is slow. So far I have an impression that you're acting on false theory and therefore changing a lot more things than is actually needed.

Nov 26 '23 14:11 SyntevoAlex

Please use Java profiler on your test on original SWT code to identify what exactly is slow. So far I have an impression that you're acting on false theory and therefore changing a lot more things than is actually needed.

The original profiling results were published in https://github.com/eclipse-platform/eclipse.platform.ui/issues/649 and those are useless, as major time is spent in the obviously quadratic path. I had to eliminate the hotspot to get to the nest step of profiling and optimization.

Nov 26 '23 14:11 basilevs

eclipse.platform.swt eclipse.platform.swt copied to clipboard

TreeItem.getItemCount() and TreeItem.getItem(int) have linear complexity on Linux

Main factors

Solution

eclipse.platform.swt
eclipse.platform.swt copied to clipboard