eclipse.platform.ui icon indicating copy to clipboard operation
eclipse.platform.ui copied to clipboard

TextSearchVisitor: don't check for binary files if not required

Open Wittmaxi opened this issue 1 year ago • 4 comments

If a workspace search is allowed to search in binary files, we should not perform a (possibly expensive) check for whether a file is binary - we are going to search through it regardless of the result of that query.

As discussed in https://github.com/eclipse-platform/eclipse.platform.ui/issues/2180

Note: the performance increase is not yet optimal - we still load the file to ram, the difference is that now, we don't scan through the file for zero-bytes. A "good" solution would recognize file extensions and not load those files to begin with.

Wittmaxi avatar Aug 23 '24 07:08 Wittmaxi

Test Results

 1 815 files  ±0   1 815 suites  ±0   1h 44m 10s :stopwatch: + 10m 10s  7 696 tests ±0   7 468 :white_check_mark: ±0  228 :zzz: ±0  0 :x: ±0  24 249 runs  ±0  23 500 :white_check_mark: ±0  749 :zzz: ±0  0 :x: ±0 

Results for commit d7a467ff. ± Comparison against base commit a801237c.

github-actions[bot] avatar Aug 23 '24 08:08 github-actions[bot]

A "good" solution would recognize file extensions and not load those files to begin with.

performance optimizations should show the benefit by measurement, and please lets go for a "good" solution - opening a file is typically the costly thing while reading the content is almost not noticeable if its in the OS cache.

jukzi avatar Aug 23 '24 11:08 jukzi

@jukzi my current focus is finishing up the tasks in the Umbrella issue for the Find/Replace overlay. I understand that this is more of a "low hanging fruit"-PR, I might be able to come up with a more substantial improvement in a few months - and yet, I can't commit to anything at this point.

If this is OK for you, I will merge this PR once master opens up again

Wittmaxi avatar Aug 26 '24 13:08 Wittmaxi

@jukzi please read the PR description for what case this is meant:

If a workspace search is allowed to search in binary files,

So this is for the case where searching in binary files is enabled, meaning even if we want to search in binary files (currently second check!) it is first check if it is a binary file just to the discard the result because we want to search in Every file regardless if binary or not...

If binary search is disabled it doesn't matter much as you pointed out.

laeubi avatar Aug 26 '24 14:08 laeubi

I cannot take ownership of this anymore. I believe I have seen another PR which fixed this already, so this PR is obsolete

Wittmaxi avatar Oct 16 '24 12:10 Wittmaxi