RepoSense
RepoSense copied to clipboard
[#2091] Minor Enhancements to Existing Regex Code
[#2091] Suggestions on improvement for memory performance regarding Regex matching
Proposed commit message
From the previous issue, it was discovered that some areas of the code
used regex matching and splitting during an iteration. It is not
recommended to do so due to the overhead incurred with the creation of
`Matcher` objects during the match or split process.
Let's move to change the code such that we avoid such patterns within
the codebase.
Other information
This was something that was missed out during the previous PR that aimed to resolve issue #2091.
@sopa301 It does seem worthwhile to consolidate common Regex patterns into one single class to avoid creating multiple Pattern/Matcher objects. I will look into doing that!
It does seem worthwhile to consolidate common Regex patterns into one single class to avoid creating multiple Pattern/Matcher objects. I will look into doing that!
Do you intend to do this in this PR? Also, in case you've done the profile, I'm curious about the extent of performance hit that split
incurs.
@gok99 I will work on the aforementioned comment soon and update you with the results from the profiler!
@gok99 After some preliminary testing and profiling, it seems that the refactoring does not have much of an impact on the overall performance of the code. The increases or decreases may likely be the result of randomness when executing the program. Here are the findings:
Time
- Before refactor: 6079ms
- After refactor: 5820ms
Space
- Before refactor: 857.902 MB
- After refactor: 862.132 MB
These findings were made before we consolidated Regex patterns into one single class to avoid duplication, hence results may vary later on.
@sopa301 and @gok99, I have implemented the suggestion to consolidate all commonly used Regex patterns into the StringsUtil
class. Please take a look and let me know where I can improve the code!
So far, I'm not too sure if reusing Pattern
objects in this manner as a bid to save some memory usage would warrant the increase in coupling between all other classes and StringsUtil
; do let me know what you guys think!
Thanks @asdfghjkxd for consolidating the patterns! Do you have any data on the effect of this refactoring on the performance? I agree that if the performance increase is negligible/insignificant, we'll probably be better off leaving those patterns alone.
I'm under the impression that these matchers would be repeatedly created especially with the frequency that these functions are called. Perhaps the compiler may have automatically optimised it for us.
I definitely think this is worth doing if just for the improved readability and handy regex utils (are there other non-trivial but handy regexes elsewhere that could go here?). I don't think coupling is as much of concern since these are just temporary static helper functions.
Will leave @reposense/active-developers to comment / approve first.
@gok99 Sure thing, will look through the codebase to find other places with repeated regex patterns!
The following links are for previewing this pull request:
- Dashboard Preview: https://dashboard-2115-pr-reposense-reposense.surge.sh
- Docs Preview: https://docs-2115-pr-reposense-reposense.surge.sh