SATOSA
SATOSA copied to clipboard
Refactor config attributes loading
- Extends the validation of the mandatory_dict_keys in config by checking that they also have a value. There is no point in having checks for mandatory keys if they are empty (this would mean that they are not really mandatory)
- Extracts
_parse_configmethod to utilize for loading configs - Moves the
self._configcheck from_verify_dictmethod to beginning of initialization. If the main config file is wrong, there's no need proceeding any more - Extracts plugin loading to
_load_pluginsmethod - Fixes #184 by setting empty list as default value for plugin configs in case a specific plugin config is unset thus allowing MICRO_SERVICES plugin to be empty (instead of deleting the key)
- Adds test cases for invalid internal attributes and empty MICRO_SERVICES plugin
- Adds test case for invalid conf although since the exception thrown is generic for other config cases as well, the only way to verify this is by checking the coverage (subclassing the config errors is probably a better alternative)
All Submissions:
- [X] Have you checked to ensure there aren't other open Pull Requests for the same update/change?
- [X] Have you added an explanation of what problem you are trying to solve with this PR?
- [X] Have you added information on what your changes do and why you chose this as your solution?
- [X] Have you written new tests for your changes?
- [X] Does your submission pass tests?
- [X] This project follows PEP8 style guide. Have you run your code against the 'flake8' linter?
- Extends the validation of the mandatory_dict_keys...
- Extracts
_parse_config...- Moves the self._config check...
- Extracts plugin loading...
- Fixes #184 ...
- Adds test cases for invalid internal attributes...
- Adds test case for invalid conf...
There are 7 bullets describing changes, but only two commits. Can we balance this in the future?
If the main config file is wrong, there's no need proceeding any more
If there different config files, their checks are independent. I would like to know all the errors I could possibly know with a single run, fix all I can fix, and retry.
subclassing the config errors is probably a better alternative
Yes, we should do this. Are there any other alternatives?
- Extracts _parse_config method to utilize for loading configs
- Extracts plugin loading to _load_plugins method
This is good; but the interesting part is why. Why do these need to be extracted? How is it better having them separate?
Ideally, this should be part of the commit message that introduces the respective change.
There are 7 bullets describing changes, but only two commits. Can we balance this in the future?
Sure, I can rebase and split them to more commits
If the main config file is wrong, there's no need proceeding any more If there different config files, their checks are independent. I would like to know all the errors I could possibly know with a single run, fix all I can fix, and retry.
The original behavior didn't change since my purpose was to refactor the code and not change the behavior. Every stop and every error is in the same order as it was before. I think that if we wish to change the behavior to that which you describe, it should be in a different PR not in the same PR as the refactoring.
subclassing the config errors is probably a better alternative
Yes, we should do this. Are there any other alternatives?
Sure there are alternatives to subclassing (i.e carrying the erroneous config as a class variable could be one alternative) but I'm not sure we need this.
This is good; but the interesting part is why. Why do these need to be extracted? How is it better having them separate?
Initially this PR was about #184 . Reading through the code I realized that the __init__ method was too big (for my taste) and also had some deeply nesting (which resulted in cyclomatic complexity 12). So I decided to refactor this to reduce the complexity and at the same time make the constructor less verbose hiding the details in other methods. I think it is kind of obvious about why this extraction is useful. Otherwise the _verify_dict and the load_yaml and load_dict methods would be inside the constructor as well.
If the main config file is wrong, there's no need proceeding any more If there different config files, their checks are independent. I would like to know all the errors I could possibly know with a single run, fix all I can fix, and retry.
The original behavior didn't change since my purpose was to refactor the code and not change the behavior. Every stop and every error is in the same order as it was before. I think that if we wish to change the behavior to that which you describe, it should be in a different PR not in the same PR as the refactoring.
We do not change things just because they can be changed. Refactoring is not about moving things around. The refactor should have a goal which should be stated and it should answer how it makes things "better". In this process it is definitely desirable to group things differently (prepare for the changes) and then change how they behave (make things better), to make the transition clear and understandable (it would be also be reflected by the commits.) But still, the goal is to change things to make them better. How far we want to go with that is what we discuss here. I'm all in for pushing that a lot. If you feel we should restrict this and have separate PRs for related aspects, let's do that.
subclassing the config errors is probably a better alternative
Yes, we should do this. Are there any other alternatives?
Sure there are alternatives to subclassing (i.e carrying the erroneous config as a class variable could be one alternative) but I'm not sure we need this.
We don't need what? Do you mean we do not need alternatives? How does "carrying the erroneous config as a class variable" help?
Just to remind us, what we try to solve here is what you initially wrote: how can we know that an exception we expect came from a specific handler?
This is good; but the interesting part is why. Why do these need to be extracted? How is it better having them separate?
Initially this PR was about #184 . Reading through the code I realized that the
__init__method was too big (for my taste) and also had some deeply nesting (which resulted in cyclomatic complexity 12). So I decided to refactor this to reduce the complexity and at the same time make the constructor less verbose hiding the details in other methods. I think it is kind of obvious about why this extraction is useful. Otherwise the_verify_dictand theload_yamlandload_dictmethods would be inside the constructor as well.
It is not obvious to everyone, as it was not to the original author. Additionally, it may not be obvious at all, when three years later all the surrounding code has changed. Having this as part of the history is very helpful when someone will need to go through it and understand. This particular case may not be very important, but we should still do this to push our thinking in that direction.
If the main config file is wrong, there's no need proceeding any more If there different config files, their checks are independent. I would like to know all the errors I could possibly know with a single run, fix all I can fix, and retry.
The original behavior didn't change since my purpose was to refactor the code and not change the behavior. Every stop and every error is in the same order as it was before. I think that if we wish to change the behavior to that which you describe, it should be in a different PR not in the same PR as the refactoring.
We do not change things just because they can be changed. Refactoring is not about moving things around. The refactor should have a goal which should be stated and it should answer how it makes things "better". In this process it is definitely desirable to group things differently (prepare for the changes) and then change how they behave (make things better), to make the transition clear and understandable (it would be also be reflected by the commits.) But still, the goal is to change things to make them better.
As I said, the point of this refactoring (which came as a side effect of #184) was to improve readability by reducing the nested for loops, reducing complexity (same way), and extracting operations with a mind of reusability (without pushing reusability too much to end up in coupling). There doesn't have to be always a behavioral change or feature to refactor although the real value of the refactoring is to restructure the code in a way that is easier to maintain, extend and change. So with that mindset, I believe that we should change things because they can be changed to provide the aforementioned things.
How far we want to go with that is what we discuss here. I'm all in for pushing that a lot. If you feel we should restrict this and have separate PRs for related aspects, let's do that.
I really believe this PR discussion has gone way out of track and the PR lost it's value.
subclassing the config errors is probably a better alternative
Yes, we should do this. Are there any other alternatives?
Sure there are alternatives to subclassing (i.e carrying the erroneous config as a class variable could be one alternative) but I'm not sure we need this.
We don't need what? Do you mean we do not need alternatives? How does "carrying the erroneous config as a class variable" help?
Just to remind us, what we try to solve here is what you initially wrote: how can we know that an exception we expect came from a specific handler?
Carrying the erroneous config would help to discern which config was erroneous not trace the line/caller the exception came from. I wrote what I wrote because this exception class was used everywhere and the only way to check the correct one was thrown was to check the error message which I consider wrong. If for some reason (what would that be?) one wants to trace the function caller that resulted in the exception, they could as well use the traceback module. It's like trying to trace the exit point of a multiple-exit-point function
This is good; but the interesting part is why. Why do these need to be extracted? How is it better having them separate?
Initially this PR was about #184 . Reading through the code I realized that the
__init__method was too big (for my taste) and also had some deeply nesting (which resulted in cyclomatic complexity 12). So I decided to refactor this to reduce the complexity and at the same time make the constructor less verbose hiding the details in other methods. I think it is kind of obvious about why this extraction is useful. Otherwise the_verify_dictand theload_yamlandload_dictmethods would be inside the constructor as well.It is not obvious to everyone, as it was not to the original author. Additionally, it may not be obvious at all, when three years later all the surrounding code has changed. Having this as part of the history is very helpful when someone will need to go through it and understand. This particular case may not be very important, but we should still do this to push our thinking in that direction.
I'm still not sure what explanation to write to justify an extraction to a method. Splitting functionality would need explanation (if for example the plugins loading was split instead of being in a for loop), but I consider this a really minor refactoring to explain and justify the extraction and the rewrite of a 4 level nested for loop
I prefer to close this PR. This started as a simple and harmless imho refactoring to improve readability and complexity on the track of solving a minor issue and ended up discussing about behavioral changes, functional paradigms, readability and exception subclassing alternatives. What started as a simple change request, ended up in the longest PR discussion for this repo. I've really lost sight of what needs to be done here. So it's better to close this and handle anything that is needed after a corresponding issue is opened that would explain what needs to be done.
How far we want to go with that is what we discuss here. I'm all in for pushing that a lot. If you feel we should restrict this and have separate PRs for related aspects, let's do that.
I really believe this PR discussion has gone way out of track and the PR lost it's value.
The value is still here, both in the code and the discussion. And I actually believe that you've probably already learnt new things from these discussions. I did, others reading this did. There is no reason to dismiss this.
subclassing the config errors is probably a better alternative
Yes, we should do this. Are there any other alternatives?
Sure there are alternatives to subclassing (i.e carrying the erroneous config as a class variable could be one alternative) but I'm not sure we need this.
We don't need what? Do you mean we do not need alternatives? How does "carrying the erroneous config as a class variable" help? Just to remind us, what we try to solve here is what you initially wrote: how can we know that an exception we expect came from a specific handler?
Carrying the erroneous config would help to discern which config was erroneous not trace the line/caller the exception came from.
Yes, we do not care about the line; we care about the handler in the sense you describe - not the handler as a caller, but the operation it was doing that will help us understand what went wrong.
I wrote what I wrote because this exception class was used everywhere and the only way to check the correct one was thrown was to check the error message which I consider wrong.
Right! We should not check the error message; and even if we did, it could be the same, so this is a no-go.
So, this topic is a super interesting topic, and I believe it is in the heart of complexity in software. It is also why monads are so useful (at least, for my understanding of monads). What we see here is the need to propagate the state of the operation to produce a meaningful error message. The same thing is needed in either direction of the flow. This propagation of state is useful both to stop and analyse the program, but also to progress the program and to make decisions. There are simple ways to make this happen (save the information in some context), and this is what we will do for a start. But modelling the generic mechanism to get that information without having to redo the tedious plumbing is very interesting, and leads to a different paradigm; it is not about types, purity, or what-not, it is about information accumulation and decisions.
This is good; but the interesting part is why. Why do these need to be extracted? How is it better having them separate?
Initially this PR was about #184 . Reading through the code I realized that the
__init__method was too big (for my taste) and also had some deeply nesting (which resulted in cyclomatic complexity 12). So I decided to refactor this to reduce the complexity and at the same time make the constructor less verbose hiding the details in other methods. I think it is kind of obvious about why this extraction is useful. Otherwise the_verify_dictand theload_yamlandload_dictmethods would be inside the constructor as well.It is not obvious to everyone, as it was not to the original author. Additionally, it may not be obvious at all, when three years later all the surrounding code has changed. Having this as part of the history is very helpful when someone will need to go through it and understand. This particular case may not be very important, but we should still do this to push our thinking in that direction.
I'm still not sure what explanation to write to justify an extraction to a method. Splitting functionality would need explanation (if for example the plugins loading was split instead of being in a
forloop), but I consider this a really minor refactoring to explain and justify the extraction and the rewrite of a 4 level nestedforloop
You've already written that you though it was verbose, too big and complex. These are good reasons, but you're basing them on "taste" or some number that is supposed to indicate complexity, but you know that there is something bothering you and it is not the number of lines, neither the CC, because you have seen other code that may be longer or more complex, but it still felt fine. The hard question is what makes anyone judge something as verbose, big or complex. In a previous PR you wrote something related and very important, and you wrote a similar thing elsewhere in this PR. Let me quote:
Now there are 3 functions: one for the creation of the metadata objects, one for the writing the metadata objects to file(s), and one for composing those two to a final result
So, what drives the split is exactly what you write: you want to separate concerns, and then you want to compose them to achieve a bigger effect. There is more than composition, but separating concerns and composing them is a first step that makes this feeling that is bothering us and that we attributed to taste, an actual guide to making such decisions. The way you separate concerns is by figuring out the level of detail that describe an operation. Finer details are encapsulated and split into separate functions, so that the function is concerned (not only with a single aspect, but) with details of the same abstraction level. The next step is orchestration: every function is an orchestrator of data and effects. Values are derived by transforming data, where transformations are compositions of functions. Business results are derived by chaining effects that are built on top of those compositions. Orchestration provides a higher level way to think about and deal with the concerns of business requirements. It allows us to understand the abstraction details of business operations (which are implemented as steps of function compositions).
I prefer to close this PR. This started as a simple and harmless imho refactoring to improve readability and complexity on the track of solving a minor issue and ended up discussing about behavioral changes, functional paradigms, readability and exception subclassing alternatives.
This is a discussion. We explore what can be done better. It is the only way to design things, and I know you are capable of doing that. (I am also pretty sure you want to do that.) All the things you describe above, are rooted to deeper questions, sometimes annoyingly abstract, and these are the questions I try to bring out. I dedicate time to this, because I know you can contribute to these questions and I know that many of these are things that have been bothering you, too.
What started as a simple change request, ended up in the longest PR discussion for this repo.
The length of the PR discussion is irrelevant.
I've really lost sight of what needs to be done here. So it's better to close this and handle anything that is needed after a corresponding issue is opened that would explain what needs to be done.
What needs to be done is what we will decide by discussing the different tradeoffs of the alternative choices we have. The reason to have multiple comments and not a big one, is so that we can conclude for each point separately.
Are you interested in doing this with me?
https://github.com/IdentityPython/SATOSA/pull/251#issuecomment-510688196 I totally agree
I prefer to close this PR. This started as a simple and harmless imho refactoring to improve readability and complexity on the track of solving a minor issue and ended up discussing about behavioral changes, functional paradigms, readability and exception subclassing alternatives.
This is a discussion. We explore what can be done better. It is the only way to design things, and I know you are capable of doing that. (I am also pretty sure you want to do that.) All the things you describe above, are rooted to deeper questions, sometimes annoyingly abstract, and these are the questions I try to bring out. I dedicate time to this, because I know you can contribute to these questions and I know that many of these are things that have been bothering you, too.
What started as a simple change request, ended up in the longest PR discussion for this repo.
The length of the PR discussion is irrelevant.
I've really lost sight of what needs to be done here. So it's better to close this and handle anything that is needed after a corresponding issue is opened that would explain what needs to be done.
What needs to be done is what we will decide by discussing the different tradeoffs of the alternative choices we have. The reason to have multiple comments and not a big one, is so that we can conclude for each point separately.
Are you interested in doing this with me?
I'm more than interested in trying to discover possible answers to these questions. These questions are the real beauty of programming and the journey to find these answers is what is worth the try, not trying to get a working result.
My point was that this PR started as a simple refactoring to initiate a possible future bigger refactoring/rewrite of things in order to improve maintainability, extensibility and plugability (someone could say it is the same with extensibility). So I think that the discussion should discuss if this refactoring is paving the aforementioned road or not, not try to define every detail of that road here in comments. Designing the details of that road should (imho) be analyzed and discussed in a separate discussion so as to have the bigger picture in mind instead of discussing every detail (because details are a part of the bigger picture). In this case, since this refactoring raised concerns about the way the configuration is read, validated and handled, a separate issue should be opened to discuss how we want these things to change for a new PR to follow. Discussing these in this PR is losing the bigger picture (since we're talking in specific snippets of code) and in a way is denoting (at least in my mind) that this PR is following that path.
I'm more than interested in trying to discover possible answers to these questions.
cool; so I'm reopening this 🙂
My point was that this PR started as a simple refactoring to initiate a possible future bigger refactoring/rewrite of things [we] should discuss if this refactoring is paving the aforementioned road or not
yes and no. For the local context of the mechanisms that handle the configurations, yes; it is fixing parts of the existing way of doing things, which is good. But, in the long run, the config format itself should be revisited. In other words, this PR is accepting that the current data source is correct; while I'm saying that to make this right we should start by modelling the data we get first.
I haven't mentioned that in the rest of the discussion, and it's not what we should focus on now. That's because there are things that will be the same regardless of the config format - it's not like everything needs to be written from scratch. These are the things I'm trying to keep this discussion focused on. Not the details of the handling and parsing, but the structure of the logic that we need in place. These discussions are not tied to the config itself, but apply elsewhere too.
To be more precise, the questions about the difference of the error messages, who checks for the error, keeping a straight code-path, how to make clear what happened, and getting errors at once about the whole input, are all things that should be rewritten with a data-first approach. (We will see what this means in practice.)
Designing the details of that road should (imho) be analyzed and discussed in a separate discussion so as to have the bigger picture in mind instead of discussing every detail
again, yes and no. Discussions need to start somewhere. Once we figure what we need to change or how, we can write a clean description of what we discussed and to what we arrived. This cannot be done the other way around. Discussion needs to take place first. Then we can summarize and state the tradeoffs and the decision. Then make changes. If we knew what should be done there would be no need to discuss it. So, I think it is good to have those discussions. Some will lead nowhere, some will lead to immediate changes, and others will unwind until we have a better understanding - at which point we can summarize in a separate email, issue or PR.
Yesterday I coded this incredible (pythonic?!?) bunch of things: https://github.com/peppelinux/pyLDAP/blob/master/client.py#L46
Today when I read it my thought has come to you, here... I just want to say that it happened, Should I leave it there? I dunno, but I think that my son shouldn't read it, he should never code in this way, but it exists and it works and I'll never use debugger there. Sorry, I just want to say that things happen and somethime we should let exists :')
Yesterday I coded this incredible (pythonic?!?) bunch of things: https://github.com/peppelinux/pyLDAP/blob/master/client.py#L46
Today when I read it my thought has come to you, here... I just want to say that it happened, Should I leave it there? I dunno, but I think that my son shouldn't read it, he should never code in this way, but it exists and it works and I'll never use debugger there. Sorry, I just want to say that things happen and somethime we should let exists :')
if it makes you feel any better, that line cannot be written as a one-liner with map/filter. but even if you choose to do it in 2 lines, the result is still bad. so it would be better to refactor it unless you want to explain this to your son :stuck_out_tongue:
Great! I Just want to explain that everything could be refactored but the ideas are the most important things. Thank you for your suggestion, I'll back to that!