code-base-investigator
code-base-investigator copied to clipboard
Optimize symlink handling
Feature/behavior summary
In a recent run of Code Base Investigator 1.x, I saw almost 50% of the execution time spent in os.path.realpath:
While it is important for us to detect and handle symlinks securely, I suspect we could find a faster way to meet this requirement that doesn't involve expanding every path that we encounter while searching for include files.
Request attributes
- [X] Would this be a refactor of existing code?
- [ ] Does this proposal require new package dependencies?
- [ ] Would this change break backwards compatibility?
Related issues
No response
Solution description
- Defer symlink detection until we decide if we will actually open a file.
- Discuss how to handle symlinks that are in a code base by design (e.g., to avoid duplicating a folder).
Additional notes
Profiling a few alternatives:
-
os.path.realpath: 286 seconds -
os.path.abspath: 148 seconds (1.93x speed-up) - Nothing: 139 seconds (2.06x speed-up)