Hello!

I hope this is formatted acceptably, I've tried to provide context, observed & expected behaviours, possible solutions and a workaround I can use but would like to avoid.

Please let me know if I must provide any further information. If you like any of my proposed approaches I think I would be able to put together a pull request if that helps :)

Background Context

I'm currently writing a script that uses lizard.py as a library, in which I call lizard.analyze() on several directories with differing parameters. In my script I allow the user to specify exclude_patterns for each directory to be analysed by lizard.

The script is invoked from a parent directory, meaning it does not run lizard.analyze() with the paths equal to ["."], but rather ["./some/path/here"]. This means filenames lizard generates while walking the target directories are prefixed with their relative path to my script, e.g.: "./some/path/here/then/a/specific/file.txt".

This is fine, and everything works, with one slight problem: the exclude_pattern parameters to lizard are matched against the file paths lizard generates while walking, including the prefixes.

This means that, unlike the examples in the README which show concise exclude_pattern patterns relative to the analysis directory, my users must specify patterns that are aware of their target directory relative to their current working directory. E.g.: What would have been: ./tests/* Now becomes: ./repos/specific-repo/tests/*

I'm wondering if this is intended behaviour, should exclude_patterns be matched against the relative path from the analysis directory? Or should it remain how it is: matching against the relative path from the current working directory?

Observed Behaviour:

exclude_patterns match against the file's path relative to the current working directory

Expected Behaviour:

exclude_patterns match against the file's path relative to the directory being analysed

Proposed Changes

Change `exclude_pattern`s matching behaviour?

In this approach, the code in get_all_source_files() is changed to apply the exclude_patterns to file paths relative to the directory being analysed.

all_listed_files() (lizard.py:893) changed to yield a tuple of (path and path to file relative to path)
_validate_file() (lizard.py:884) changed to apply exclude patterns against the relative part (second element of tuple from previous point)
get_all_source_files() has a final map call added that flattens the tuple into a full path again, e.g.: os.path.join(tuple[0], tuple[1])

Pros

No changes to the public API

Cons

Changes the current behaviour of the library, which may disrupt current users
Fixes my specific use-case, but doesn't account for the many ways somebody might want to filter files

Allow user to provide custom function for use in `_validate_file()`?

In this approach, the public API of the library is adjusted to allow the caller to pass a callable/function/lambda that is invoked in _validate_file() (lizard.py:884). This would allow a caller to provide completely arbitrary filtering logic without the library caring how it's done.

In my case, this would allow me to manipulate the path to be relative to the directory I want it to be relative to before applying exclusion patterns myself.

Pros

Does not change the current library behaviour (current users unaffected)
Allows any user to customise how they want to filter files using arbitrary code

Cons

Change to public API, but shouldn't actually affect anybody (would add a new parameter for passing a callable that defaults to None, or a dummy filter that always returns True, but anyone not currently passing this will get the default behaviour that is consistent with current behaviour)

Workarounds & Alternatives

I can work around this by making my script prefix the user's exclude_patterns with the relative path from the current directory to the analysis directory, but it's a leaky abstraction that I'm concerned may surprise users - so I'm curious to know if lizard is supposed to behave this way?

Aug 25 '18 12:08 omaskery

PS, in case it's useful for additional context, I've now made the script I'm working on available here.

Within that file: the analyse_repo() function calls into lizard on this line in particular. Notice that before I can pass in exclude_patterns, I must first patch them using the patch_relative_exclude_patterns() function here.

Aug 25 '18 22:08 omaskery

Hi Oliver,

Thanks for your proposal. Just landed at Tokyo after 7h of flight. Let me get to the details a bit later.

On 26 Aug 2018, at 7:33 AM, Oliver Maskery [email protected] wrote:

PS, in case it's useful for additional context, I've now made the script I'm working on available here https://github.com/omaskery/lizard-monitor.

Within that file: the analyse_repo() function https://github.com/omaskery/lizard-monitor/blob/master/lizard-mon.py#L74 calls into lizard on this line in particular https://github.com/omaskery/lizard-monitor/blob/master/lizard-mon.py#L93. Notice that before I can pass in exclude_patterns, I must first patch them using the patch_relative_exclude_patterns() function here https://github.com/omaskery/lizard-monitor/blob/master/lizard-mon.py#L79.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/terryyin/lizard/issues/238#issuecomment-416000794, or mute the thread https://github.com/notifications/unsubscribe-auth/AAwJYtb_9M2LU0RGjj82CSj8HqXX5xV9ks5uUdDAgaJpZM4WMXS_.

Aug 26 '18 08:08 terryyin

Not a problem, not in a rush, hope you had a good flight :)!

Aug 26 '18 11:08 omaskery

exclusion patterns are sensitive to current working directory

Background Context

Observed Behaviour:

Expected Behaviour:

Proposed Changes

Change exclude_patterns matching behaviour?

Pros

Cons

Allow user to provide custom function for use in _validate_file()?

Pros

Cons

Workarounds & Alternatives

Change `exclude_pattern`s matching behaviour?

Allow user to provide custom function for use in `_validate_file()`?