import-linter icon indicating copy to clipboard operation
import-linter copied to clipboard

Exclude modules from the graph, according to wildcard expressions

Open iagocanalejas opened this issue 5 years ago • 16 comments

Allow ignored regex expressions in layers contract.

As a example y don't want my *.tests.* modules to be processed.

iagocanalejas avatar May 15 '19 15:05 iagocanalejas

Hi Iago, thanks for the feature request.

A good way to go with this, I think, is to have a configuration option that allows modules to be ignored (currently you can only ignore relationships between modules, using theignore_imports option). Wildcard support would be a good idea as part of this.

I think this is potentially a good idea, I can't give you a date by which I'd implement it though. I'd certainly consider a pull request for it.

In the meantime you could implement this feature using a custom contract type. The approach would be to subclass importlinter.contracts.LayersContract contract and then override the check method to remove the tests modules from the graph before running the contract check. Unfortunately Grimp doesn't yet currently provide a .remove_import method on the graph, but the following hack should work:

graph._networkx_graph.remove_node("mypackage.tests.foo")

seddonym avatar May 16 '19 07:05 seddonym

Thanks for quick answer, i will try your solution and i also will try to have some time to make a PR adding wildcard support.

iagocanalejas avatar May 16 '19 07:05 iagocanalejas

Great - let me know if you need any pointers.

seddonym avatar May 16 '19 08:05 seddonym

Can't make it work with a custom contract. Even if i just subclass LayersContract using a custom contract just returns 'ListField' object is not iterable

[importlinter]
root_package = apps
contract_types =
    custom_layer: docs.layers.CustomLayersContract
class CustomLayersContract(LayersContract):
    pass

Results in:

'ListField' object is not iterable

iagocanalejas avatar May 16 '19 09:05 iagocanalejas

You've discovered a bug! I've been able to reproduce this and have added it as an issue here: https://github.com/seddonym/import-linter/issues/45

In the meantime, the workaround is just to redeclare the fields explicitly in your custom contract, i.e.

class CustomLayersContract(LayersContract):
    containers = LayersContract.containers
    layers = LayersContract.layers
    ignore_imports = LayersContract.ignore_imports

seddonym avatar May 17 '19 11:05 seddonym

@seddonym @iagocanalejas this issue and the PR seem a little bit outdated 😄

I would like to extend this feature request to the independence type. Basically the ignore option should allow wildcards. If you like, I can give it a try building on top of the current PR. This would also help a project of mine where else we need to define a long list of ignores due the extensive usage of private submodules

kasium avatar Aug 04 '21 07:08 kasium

Hi @kasium - thanks for reaching out. I'd be happy to review a PR from you if you have capacity to move this forward - might be easier to start a dedicated PR though.

seddonym avatar Aug 05 '21 12:08 seddonym

@seddonym before I start a simple question: What kind of wildcards do we want to support?

  • full blown regex (how can we decide if the module name is in fact a regex)
  • simple wildcards (*, ?)

I would go for the second one since both are invalid identifier names in python

kasium avatar Aug 05 '21 13:08 kasium

Simple wildcards I think. See https://docs.python.org/3/library/fnmatch.html. Have a look too at my comment on the other PR.

seddonym avatar Aug 05 '21 15:08 seddonym

I see that make sense. So for which fields to you see this feature? Personally, I only have demand to have it for ignores and there only on the left side.

Implementation wise, I currently think of a special class WildcardModule or so. Only supporting it for ignores would make it more easy to add, because it would only affect the pop_imports method

kasium avatar Aug 05 '21 16:08 kasium

So for which fields to you see this feature?

Let's stick to ignore_imports for the time being. Since we now make a copy of the graph for each contract, it's okay to remove modules before analysing it.

Personally, I only have demand to have it for ignores and there only on the left side.

It's probably simpler for it to be on both sides, so we have something like this?

ignore_imports:
    *.tests.* -> myapp.utils.*

Implementation wise, I currently think of a special class WildcardModule or so.

I reckon probably the best direction is to adjust the DirectImportField and pop_imports to support wildcards in the imports, but possibly there's a better option.

Let me know if you want to chat further before getting stuck in.

seddonym avatar Aug 05 '21 17:08 seddonym

Thanks a lot for your support. I tried to get some more information and ideas, and here are my (hopefully) final suggestion:

I guess that pop_import is the wrong place to handle the wildcards, because DirectImportField generates a DirectImport which again uses a Module. All of them have methods which are not designed to handle wildcards. Therefore I think that improving the parsing of DirectImportField is the easiest way.

  1. Check if the module name contains wildcards, if not, proceed as normal
  2. Check if the first part of the module contains a wildcard. If yes, raise an error (this is important for the next steps)
  3. Allow the parse method to receive the graph object
  4. Get all submodules of the first module (therefore it can't handle a wildcard). Then check for each child if it's matched
  5. Return a list of DirectImport instead of only one object. This would require some more logic in Contract._populate_fields but not too much

So in short, instead of letting the user write down any combination, we generate it for him/her.

This idea feels to be more elegant and doesn't require changes in grimp. Also it leaves code at one place and all other objects are still the same.

What do you think about this?

kasium avatar Aug 06 '21 06:08 kasium

I think the simplest approach would be to allow wildcards in DirectImports (which already is the case) add an optional explode_wildcards argument to the pop_imports helper.

The contracts implementing ignore_imports would not need to change, other than explicitly passing in that argument:

class IndependenceContract(Contract):
    ...
    ignore_imports = fields.SetField(subfield=fields.DirectImportField(), required=False)

    def check(self, graph: ImportGraph) -> ContractCheck:
        ...
        helpers.pop_imports(
            graph, self.ignore_imports if self.ignore_imports else []  # type: ignore,
            explode_wildcards=True,
        )

helpers.pop_imports then becomes:

def pop_imports(
    graph: ImportGraph, imports: Iterable[DirectImport], explode_wildcards: bool = False
) -> List[Dict[str, Union[str, int]]]:
    """
    Removes the supplied direct imports from the graph.

    Args:
        ...
        explode_wildcards: If True, any DirectImports containing `*`s will be converted
                           into every import present in the graph that fits that pattern.
                           At least one import must match each wildcarded import, otherwise
                           MissingImport will be raised.
    Returns:
        The list of import details that were removed, including any additional metadata.

    Raises:
        MissingImport if the import is not present in the graph.
    """

Check if the first part of the module contains a wildcard. If yes, raise an error

I don't see why we can't support 'root' wildcards. The graph can support multiple root packages (they will have been passed in as root_package_names).

How does that sound?

seddonym avatar Aug 06 '21 14:08 seddonym

Thanks for the suggestion.

My worry about moving the "explosion" to pop_imports is, that the methods in Module like parent or is_child_of need to be able to handle wildcards as well. Or do I miss something?

Also in grimp I couldn't find a method to get all root modules, therefore the limitation of the root package. But I guess I need to use the root packages provided to the config

kasium avatar Aug 06 '21 14:08 kasium

My worry about moving the "explosion" to pop_imports is, that the methods in Module like parent or is_child_of need to be able to handle wildcards as well.

Hmm, you make a good point. I don't think Module objects should be doing that kind of thing as they'd need knowledge of the graph.

The challenge is that up until this point, DirectImport and Module have been used to represent two things: actual direct imports and modules, but also user input in the contract.

Perhaps it would make more sense to introduce a new class similar to DirectImport but which represents just what the user passed in. Not sure what the best is but "ImportExpression" is the first thing that comes to mind. We could then replace DirectImportField with ImportExpressionField which returns an ImportExpression consisting merely of an importer string and an imported string (not Modules).

We could leave pop_imports as it is but introduce a new function which converts the import expressions (which might have wildcards in) to direct imports:

def explode_import_expressions(graph: ImportGraph, imports: Iterable[ImportExpression]) -> Iterable[DirectImport]:
   ...

Any better?

seddonym avatar Aug 06 '21 14:08 seddonym

Let me give it a try after the weekend. I like the idea

kasium avatar Aug 07 '21 17:08 kasium