RepoSense icon indicating copy to clipboard operation
RepoSense copied to clipboard

[#2091] Minor Enhancements to Existing Regex Code

Open georgetayqy opened this issue 1 year ago • 6 comments

[#2091] Suggestions on improvement for memory performance regarding Regex matching

Proposed commit message

From the previous issue, it was discovered that some areas of the code
used regex matching and splitting during an iteration. It is not
recommended to do so due to the overhead incurred with the creation of
`Matcher` objects during the match or split process.

Let's move to change the code such that we avoid such patterns within
the codebase.

Other information

This was something that was missed out during the previous PR that aimed to resolve issue #2091.

georgetayqy avatar Feb 16 '24 12:02 georgetayqy

@sopa301 It does seem worthwhile to consolidate common Regex patterns into one single class to avoid creating multiple Pattern/Matcher objects. I will look into doing that!

georgetayqy avatar Feb 16 '24 16:02 georgetayqy

It does seem worthwhile to consolidate common Regex patterns into one single class to avoid creating multiple Pattern/Matcher objects. I will look into doing that!

Do you intend to do this in this PR? Also, in case you've done the profile, I'm curious about the extent of performance hit that split incurs.

gok99 avatar Feb 19 '24 04:02 gok99

@gok99 I will work on the aforementioned comment soon and update you with the results from the profiler!

georgetayqy avatar Feb 19 '24 04:02 georgetayqy

@gok99 After some preliminary testing and profiling, it seems that the refactoring does not have much of an impact on the overall performance of the code. The increases or decreases may likely be the result of randomness when executing the program. Here are the findings:

Time

  • Before refactor: 6079ms
  • After refactor: 5820ms

Space

  • Before refactor: 857.902 MB
  • After refactor: 862.132 MB

These findings were made before we consolidated Regex patterns into one single class to avoid duplication, hence results may vary later on.

georgetayqy avatar Feb 21 '24 08:02 georgetayqy

@sopa301 and @gok99, I have implemented the suggestion to consolidate all commonly used Regex patterns into the StringsUtil class. Please take a look and let me know where I can improve the code!

So far, I'm not too sure if reusing Pattern objects in this manner as a bid to save some memory usage would warrant the increase in coupling between all other classes and StringsUtil; do let me know what you guys think!

georgetayqy avatar Feb 23 '24 07:02 georgetayqy

Thanks @asdfghjkxd for consolidating the patterns! Do you have any data on the effect of this refactoring on the performance? I agree that if the performance increase is negligible/insignificant, we'll probably be better off leaving those patterns alone.

I'm under the impression that these matchers would be repeatedly created especially with the frequency that these functions are called. Perhaps the compiler may have automatically optimised it for us.

sopa301 avatar Feb 23 '24 11:02 sopa301

I definitely think this is worth doing if just for the improved readability and handy regex utils (are there other non-trivial but handy regexes elsewhere that could go here?). I don't think coupling is as much of concern since these are just temporary static helper functions.

Will leave @reposense/active-developers to comment / approve first.

gok99 avatar Mar 18 '24 04:03 gok99

@gok99 Sure thing, will look through the codebase to find other places with repeated regex patterns!

georgetayqy avatar Mar 18 '24 06:03 georgetayqy

The following links are for previewing this pull request:

  • Dashboard Preview: https://dashboard-2115-pr-reposense-reposense.surge.sh
  • Docs Preview: https://docs-2115-pr-reposense-reposense.surge.sh

github-actions[bot] avatar Mar 26 '24 03:03 github-actions[bot]