code-base-investigator icon indicating copy to clipboard operation
code-base-investigator copied to clipboard

Optimize symlink handling

Open Pennycook opened this issue 1 year ago • 0 comments

Feature/behavior summary

In a recent run of Code Base Investigator 1.x, I saw almost 50% of the execution time spent in os.path.realpath:

image

While it is important for us to detect and handle symlinks securely, I suspect we could find a faster way to meet this requirement that doesn't involve expanding every path that we encounter while searching for include files.

Request attributes

  • [X] Would this be a refactor of existing code?
  • [ ] Does this proposal require new package dependencies?
  • [ ] Would this change break backwards compatibility?

Related issues

No response

Solution description

  • Defer symlink detection until we decide if we will actually open a file.
  • Discuss how to handle symlinks that are in a code base by design (e.g., to avoid duplicating a folder).

Additional notes

Profiling a few alternatives:

  • os.path.realpath: 286 seconds
  • os.path.abspath: 148 seconds (1.93x speed-up)
  • Nothing: 139 seconds (2.06x speed-up)

Pennycook avatar Sep 20 '24 10:09 Pennycook