AcyclicContract
A contract that checks whether the dependency graph of modules/packages forms a DAG (directed acyclic graph) structure.
This contract verifies that the directed graph of module imports does not contain any cycles. A cycle in the graph implies that there is a circular dependency between modules, which violates the DAG structure requirement.
The new contract can check the cycles on module or package level.
@seddonym @Peter554
Hello again. I worked through this PR to the state that is already useful for two of my projects and I would like to share it with broader community. But first, I would like to get reviews from you.
My summary:
- Very thanks for @Peter554 on grimp implementation for finding a shortest cycle between modules. It seems to work as intended with one exception that we could improve. The cycle that is returned is not deterministic. For the same graph and the same modules we can get different cycles on different runs.
- Finally, I didnt utlize the
as_packageflag on methodfind_shortest_cyclefor finding dependencies between packages. I make artificial imports between packages and create a dictionary that keeps the mapping of artificial to origin dependency. The mapping is later used for logging the contract message and makes it clear what imports create a package dependency. - Thanks @seddonym for your suggestions of how the contract format should looks like. The logs are super clear now.
- I have introduced simple unit tests. Should we extend the automatic tests or the current suit is good enough?
- In addition to simple unit tests, I tested the contract for my two "pretty big" python repos (on is a legacy repo with a lot of hidden module dependencies and package dependencies) and django. It works as intended and it is pretty fast. Btw. django has a lot of circular dependencies on both, module level and package level.
- In the end of August I am going for PyCon Greece conference to present a topic "Consistent importing" (WIP: https://github.com/K4liber/consistent_importing). I hope you dont mind that I will mention
import-linterand the work that has been done with theAcyclicContract.
Looking forward to hear from you. Cheers!
Thanks for sharing! I will take a look (though it might be a week or two before I get to it).
That's cool about PyCon Greece (and of course you can talk about Import Linter). Do please share a recording of the talk once it's available, I'd love to watch.
Thanks, no rush :) I will share the presentation if it gets recorded.
Sorry I haven't got around to this yet, it's still on my radar!
Some initial feedback (though I haven't looked closely yet).
It doesn't seem to work properly at the moment. I tried installing this (and building the
find-shortest-cyclebranch from Grimp) but I gotAttributeError: 'ImportGraph' object has no attribute 'find_matching_direct_imports'. Also, the tests are failing.My first entry point into this needs to be a working version that I can run on a real code base to get a feel for how it behaves. I appreciate that @Peter554 has closed that PR now, and also it's perhaps awkward to get the build working, so perhaps it makes sense to put this one on hold for now until we have a Grimp implementation.
That is something I will get around to eventually, but feel free to give it a try if you like - though it might be a bit difficult if you've not used Rust before.
Hope that makes sense, and thanks again for the work so far!
Thanks for the feedback. You're right. We should finish the Grimp part to some acceptable extend. I will start from taking a closer look on @Peter554 implementation and hopefully learn some Rust to push it a bit forward.
@seddonym @Peter554
-
I have merged master and fix the typing issue. The workflow should pass now.
-
I forked @Peter554 implementation of find-shortest-cycle and modified the code to use sorted collections (
Vec) so we have a deterministic outputs. It can be optimized since performances degraded by 91.52%. I executed the AcyclicContract check on django and it works quite fast (django package in django repo has 150+k Python lines, so seems like it is a good benchmark):
.venv(base) jan@jan-Inspiron-3543:~/Desktop/projects/django/django$ find . -type f -name '*.*' | awk -F. '{print $NF}' | sort | uniq | while read ext; do
count=$(find . -type f -name "*.$ext" -exec cat {} + | wc -l)
echo "$ext: $count"
done | sort -k2 -nr
po: 357391
py: 158043
js: 26698
...
djangohas a lot of circular dependencies (165 to be precise). The check executes within around 0.5s:
(import-linter) (base) jan@jan-Inspiron-3543:~/Desktop/projects/django$ time lint-imports
...
>>>> Cycles family for parent module 'django'
Sibilings:
(
django.core
django.http
django.test
django.views
)
Number of cycles: 1
Cycle 1 (package dependency):
(
-> django.views.static
-> django.http
-> django.http (full path: 'django.http.multipartparser')
-> django.core (full path: 'django.core.exceptions')
-> django.core (full path: 'django.core.management.commands.test')
-> django.test (full path: 'django.test.runner')
-> django.test
-> django.test.testcases
-> django.views.static
)
<<<< Cycles family for parent module 'django'
Number of cycle families found for a contract 'Acyclic': 165
Number of cycle families with package dependencies: 88
real 0m0.566s
user 0m0.510s
sys 0m0.050s
It's exciting seeing this take shape!
I haven't reviewed the implementation yet: I think we should get the UI right first. I've suggested a way of presenting the output, how would you feel about trying to get that working?
I also think we should explicitly specify the packages we want to be acyclic.
To set expectations, I anticipate this needing a few rounds of reviews, as it is inherently quite complex - if you'd prefer not to get sucked in, then I would be happy to build on this work as it stands - it's already been useful, so thank you!
Thanks for the feedback. I am just looking here from time to time and only have time for the implementation on some weekends. Please, feel free to contribute with a code, just keep me in the loop.
We could also make a list of "must to have" before merging to master and make other adjustments in the following PRs. I think the current state is already useful, I already use it. We can polish up the PR a bit by following your current suggestions (maybe excluding the UI/logging changes since those are more demanding).
Or we can just work on this branch till you are more confident with the implementation. As you wish :)
I am just looking here from time to time and only have time for the implementation on some weekends. Please, feel free to contribute with a code, just keep me in the loop.
Sounds good. My focus at the moment is on speeding up the library more generally, so there is no hurry. If I do end up switching focus to this I'll let you know.
Thanks for latest updates! Tests are failing though...I'll wait until they pass before trying this on Django again.
Now you've rebased off master you can make use of improved tooling for working locally, e.g. just format, just test, just lint. Hope that makes things easier! https://github.com/seddonym/import-linter/blob/main/CONTRIBUTING.rst#development
Thanks for latest updates! Tests are failing though...I'll wait until they pass before trying this on Django again.
Now you've rebased off master you can make use of improved tooling for working locally, e.g.
just format,just test,just lint. Hope that makes things easier! https://github.com/seddonym/import-linter/blob/main/CONTRIBUTING.rst#development
Thanks, just simplifies the contribution process. It was not a good idea to discard the tests during the development since a core functionality broke after removing artificial package dependencies. I reintroduced it and the tests pass now, but we kind of made a step back and need to think about grouping the cycles to show the shortest (probably using {module: cycles_including_the_module} mapping and finding the shortest for each module).
need to think about grouping the cycles to show the shortest (probably using {module: cycles_including_the_module} mapping and finding the shortest for each module).
The main thing I think we need to focus on is defining how, precisely, this is going to behave. So don't worry too much about implementing everything (e.g. it would be fine to mock out the response from Grimp). The key is to understand Grimp's API with respect to cycle detection, and then how we present it in Import Linter to end users.
I've been doing some more thinking about this feature - just noting down some findings.
Weighted edges
I think it is useful to think of the 'squashed' version of the graph (i.e. the graph showing the dependencies between subpackages of a single module) as a weighted graph, where the edge weights are the number of imports. For example, the django.db example mentioned earlier would look like this:
This is much more helpful information - we can conclude, for example, that django.db.models has a stronger dependency on django.db.backends than the other way around.
Minimal Weighted Feedback Arc Set
We can then find the Minimal Weighted Feedback Arc Set: this represents the minimum number of imports we'd need to remove from the graph in order to make it acyclic.
This is provided in the igraph library: https://igraph.org/python/api/0.9.6/igraph._igraph.GraphBase.html#feedback_arc_set
The good news is that although the problem is NP-hard, which means for larger graphs it can take an excessive amount of time, the fact that we are running it on a squashed graph seems to mean in practice it is a quick calculation.
I have tried running this on a few cyclic graphs and it seems to pick out sensible results quickly.
So, rather than reporting to the user which cycles exist (which could be a large number even in the case of smallish cyclic graphs like django.db) perhaps it would be more useful to report the imports that would need to be removed to make the graph acyclic. My hope is that that may align with newly-added problematic imports. That certainly seems to be the case sometimes.
How to render failures
We could report this something like this:
No cycles are allowed in django.db.
It could be made acyclic by removing a total of 8 imports:
- django.db.models.fields.related -> django.db.backends.utils (l. 11)
- django.db.models.functions.mixins -> django.db.backends.oracle.functions (l. 42)
- django.db.models.indexes -> django.db.backends.utils (l. 3)
- django.db.models.lookups -> django.db.backends.base.operations (l. 6)
- django.db.models.options -> django.db.backends.utils (l. 173)
- django.db.utils -> django.db.backends (l. 117)
Or, relative style (inconsistent with other contracts, but more concise):
No cycles are allowed in django.db.
It could be made acyclic by removing a total of 8 imports:
- .models.fields.related -> .backends.utils (l. 11)
- .models.functions.mixins -> .backends.oracle.functions (l. 42)
- .models.indexes -> .backends.utils (l. 3)
- .models.lookups -> .backends.base.operations (l. 6)
- .models.options -> .backends.utils (l. 173)
- .utils -> .backends (l. 117)
Or we could present it as a summary per-subpackage.
No cycles are allowed in django.db.
It could be made acyclic by removing a total of 8 imports:
- django.db.backends -> django.db.models (5 imports)
- django.db.backends -> django.db.utils (1 imports)
- django.db.utils -> django.db.models (2 imports)
Wrapping up
So, my current thinking is that we want to get to a place where there is an API in Grimp for the minimum weighted arc feedback set. I'll have a think about what that API should look like.
Then Import Linter will drill down the levels to whatever depth is configured in the contract, and call that API and report on the cycles in one of these three ways.
There is further help we could offer users who are trying to understand what to do about an acyclic graph, such as visualizations. I think this should be out of scope for Import Linter - instead we could implement it within Impulse, a currently tiny package which I also maintain. Perhaps Import Linter could even recommend the use of the tool in its error message.
What are your thoughts? Does this feel like a good direction?
I've been doing some more thinking about this feature - just noting down some findings.
Weighted edges
I think it is useful to think of the 'squashed' version of the graph (i.e. the graph showing the dependencies between subpackages of a single module) as a weighted graph, where the edge weights are the number of imports. For example, the
django.dbexample mentioned earlier would look like this:![]()
This is much more helpful information - we can conclude, for example, that
django.db.modelshas a stronger dependency ondjango.db.backendsthan the other way around.Minimal Weighted Feedback Arc Set
We can then find the Minimal Weighted Feedback Arc Set: this represents the minimum number of imports we'd need to
remove from the graph in order to make it acyclic.
This is provided in the igraph library: https://igraph.org/python/api/0.9.6/igraph._igraph.GraphBase.html#feedback_arc_set
The good news is that although the problem is NP-hard, which means for larger graphs it can take an excessive amount of time, the fact that we are running it on a squashed graph seems to mean in practice it is a quick calculation.
I have tried running this on a few cyclic graphs and it seems to pick out sensible results quickly.
So, rather than reporting to the user which cycles exist (which could be a large number even in the case of smallish cyclic graphs like
django.db) perhaps it would be more useful to report the imports that would need to be removed to make the graph acyclic. My hope is that that may align with newly-added problematic imports. That certainly seems to be the case sometimes.How to render failures
We could report this something like this:
No cycles are allowed in django.db. It could be made acyclic by removing a total of 8 imports: - django.db.models.fields.related -> django.db.backends.utils (l. 11) - django.db.models.functions.mixins -> django.db.backends.oracle.functions (l. 42) - django.db.models.indexes -> django.db.backends.utils (l. 3) - django.db.models.lookups -> django.db.backends.base.operations (l. 6) - django.db.models.options -> django.db.backends.utils (l. 173) - django.db.utils -> django.db.backends (l. 117)Or, relative style (inconsistent with other contracts, but more concise):
No cycles are allowed in django.db. It could be made acyclic by removing a total of 8 imports: - .models.fields.related -> .backends.utils (l. 11) - .models.functions.mixins -> .backends.oracle.functions (l. 42) - .models.indexes -> .backends.utils (l. 3) - .models.lookups -> .backends.base.operations (l. 6) - .models.options -> .backends.utils (l. 173) - .utils -> .backends (l. 117)Or we could present it as a summary per-subpackage.
No cycles are allowed in django.db. It could be made acyclic by removing a total of 8 imports: - django.db.backends -> django.db.models (5 imports) - django.db.backends -> django.db.utils (1 imports) - django.db.utils -> django.db.models (2 imports)Wrapping up
So, my current thinking is that we want to get to a place where there is an API in Grimp for the minimum weighted arc feedback set. I'll have a think about what that API should look like.
Then Import Linter will drill down the levels to whatever depth is configured in the contract, and call that API and report on the cycles in one of these three ways.
There is further help we could offer users who are trying to understand what to do about an acyclic graph, such as visualizations. I think this should be out of scope for Import Linter - instead we could implement it within Impulse, a currently tiny package which I also maintain. Perhaps Import Linter could even recommend the use of the tool in its error message.
What are your thoughts? Does this feel like a good direction?
I really like it, rendering the issue together with a solution. That sounds really helpful for users. Let me review it more carefully this weekend after I am back from vacations.
I've made a start on documenting an API in Grimp here.
I've made a start on documenting an API in Grimp here.
-
The whole idea about presenting cycle breakers is at least one level cooler than presenting raw cycles. I will continue a review of the idea in Grimp PR.
-
Seems like this PR can go back to Draft and more probably be removed soon than merged. Lets keep it as a potential inspiration for some part of the new implementation, right?
-
There is the
IntegerFieldandvalueproperty of a Field introduced in this PR. Maybe I should create a separate PR with that, what do you think? -
I just sent you an e-mail about my presentation "Consistent importing" where I mention Import Linter.
The whole idea about presenting cycle breakers is at least one level cooler than presenting raw cycles. I will continue a review of the idea in Grimp PR.
Great! One thing to note is it doesn't support the cycles between parents and children we identified as a different type of cycle. Possibly that could be a different method in Grimp.
Seems like this PR can go back to Draft and more probably be removed soon than merged.
Whatever you prefer. Once we've finalized the Grimp API, we could adapt this PR or start a new one.
There is the IntegerField and value property of a Field introduced in this PR. Maybe I should create a separate PR with that, what do you think?
Yes sounds good!
I just sent you an e-mail about my presentation "Consistent importing" where I mention Import Linter.
Received! Thanks.
The whole idea about presenting cycle breakers is at least one level cooler than presenting raw cycles. I will continue a review of the idea in Grimp PR.
Great! One thing to note is it doesn't support the cycles between parents and children we identified as a different type of cycle. Possibly that could be a different method in Grimp.
Seems like this PR can go back to Draft and more probably be removed soon than merged.
Whatever you prefer. Once we've finalized the Grimp API, we could adapt this PR or start a new one.
There is the IntegerField and value property of a Field introduced in this PR. Maybe I should create a separate PR with that, what do you think?
Yes sounds good!
I just sent you an e-mail about my presentation "Consistent importing" where I mention Import Linter.
Received! Thanks.
- I created PR with Parsed field logic: https://github.com/seddonym/import-linter/pull/298
- I wonder how
nominate_cycle_breakersshould be utilized in the acyclic contract implementation. Should we present cycle breakers for all descendant of the package (subpackages) or just for the package itself? Maybe another flagconsider_subpackagescould be introduced.
I wonder how nominate_cycle_breakers should be utilized in the acyclic contract implementation.
This is what I meant by the depth argument. It would default to drilling down to all sub packages, but we could stop it at a certain depth.
I am going to concentrate on implementing find_cycle_breakers within Grimp, so if you want to look into providing the scaffolding to interact with that in an Import Linter PR, go ahead! You could just stub out some fake responses from Grimp for the time being - now you know the API it should be fairly straightforward.
Also, I have been thinking about the other kind of cycle - the one that can happen between parents and children, and which find_cycle_breakers doesn't report on. IMO this is sufficiently different to warrant a separate contract type. Thought: maybe we should have an acyclic contract type which is for pure, direct cycles like that, and acyclic_packages for the one that uses find_cycle_breakers.
I'd like to concentrate on the find_cycle_breakers one for now but just letting you know my thoughts about the general direction.
FYI, I think I have a working concept here now: https://github.com/seddonym/grimp/pull/253
Next steps are to add some more thorough tests, and possibly see if there is a way to avoid the igraph dependency - but I might be happy to live with that dependency temporarily, since it is a killer feature IMO!
FYI, I think I have a working concept here now: seddonym/grimp#253
Next steps are to add some more thorough tests, and possibly see if there is a way to avoid the igraph dependency - but I might be happy to live with that dependency temporarily, since it is a killer feature IMO!
Thanks for sharing. I will take a closer look on the new grimp API soon. I think it makes more sense to abandon this PR and start from a scratch based on the new grimp API. I will:
- take an
IntegerFieldfrom this PR and create another PR - try to create another repo with a custom contract to keep the current checkpoint of the work done in this PR, just in case
EDIT: I just realized I cannot share the custom contract in a clear way since it depends on a grimp functionality that is not released (and probably will not be released soon).
I'm pleased to say that I have now released the latest Grimp, which gives us nominate_cycle_breakers.
I should have time to integrate this into the acyclic contract type, unless you would like to make a start on it? Let me know what you would prefer.
I'm pleased to say that I have now released the latest Grimp, which gives us nominate_cycle_breakers.
I should have time to integrate this into the
acycliccontract type, unless you would like to make a start on it? Let me know what you would prefer.
That's really cool the functionality is already in Grimp.
Thanks for updating me with a status and possibility to contribute. However, I think you should know better how to implement the remaining part from the import-linter side. I am glad that we had the conversations inside this PR, I have learn new things. Probably my direct coding contribution regarding the acyclic contract ends here, it does not matter too much that it was not merged after all :) Please mark me as an optional reviewer if you would like a second opinion of what you are going to implement.
I will try to update my presentation on PyCon Wroclaw with things you mentioned. Hopefully, some new people will start to use import-linter (and the acyclic contract as soon as it will be finished) and improve the quality of their code bases.
Thanks for updating me with a status and possibility to contribute. However, I think you should know better how to implement the remaining part from the import-linter side.
Sounds good. I'm aiming to get this done this week, I'll keep you posted.
I'm closing this PR now, but thanks for all your help pushing the feature forward and helping clarify the thinking around it, it's just as important as submitting code.
Acyclic Siblings contracts are now available in the latest Import Linter. I'm very excited about this feature and have already started using it. Thanks @K4liber for making this happen!
https://import-linter.readthedocs.io/en/stable/contract_types.html#acyclic-siblings