gto
gto copied to clipboard
WIP: Refactor decorators in git utils
Contributes to https://github.com/iterative/gto/issues/25
Hi @aguschin , here a first draft of what I am changing, for now just worked on the cloning decorator. I am writing few points to guide you:
- in
git_utils:- changed the signature of the cloning decorator. It needs: a. the repo b. a controller to decide when to clone c. arguments to feed the controller with
- renamed
@clone_on_remote_repo->@clone, I think that its arguments make very clear what happens - introduced
_check_required_args_are_provided, that can be re-used in the other decorators
- in
test_git_utils:- having all the tests in a single file is really painful, I tried at least to cluster them in classes. I still think that separate files would have been clearer, but I don't want to refactor all tests
- in
api:- update the decorations
- in
test_api- just check that the decorator is applied to all the commands that needs it to support remote repos
If you think that this change looks good, I can apply the same to all other decorators.
On a second thought, I believe that passing the controller and its arguments to the cloning decorator is too much.
I should have stuck to the original idea. Also, perhaps, clone_if_remote instead of clone is necessary to make the code readable...? (I ended up including the controller in the arguments to keep the name short, because I still found it not explicit enough).
However, while working it out, I realized that all decorators in the git_utils module have the same structure:
- they use some arguments given to
f, and they should check if they were indeed provided - there is a if condition (what I called controller), if false run
fas usual, otherwise do something else
Probably I could abstract it away to avoid code repetition. What do you think?
Thanks for great summary @francesco086. It helped a lot. WDYT about @skshetry suggestion above?
At the beginning I liked very much the idea of the decorator because it is not intrusive (I didn't need to change the functions in api). If I remember right, this was the reason why we decided to go for decorators, we didn't want to add a with statement in each gto command.
However, I found quite painful to work with decorators, especially for what concerns the tests. But perhaps it comes from my inexperience with them.
In the end I don't know what is best :D I would let you decide
PS: instead of with open_repo(repo, commit=commit, push=push) as r: ... one could write a decorator as @open_repo(repo_arg="repo", commit_arg="commit", push_arg="push").
I'm not very convinced about the name open_repo though, but also in this case I would leave the decision to you.
we didn't want to add a with statement in each gto command
Yes, this was one of the reasons as I remember it. Besides CLI commands, Studio will have to refactor a lot of code, if using with will be the only option to work with local repos. Maybe using with for remote repos and keeping an option to don't use with for local ones? Not sure will it look good from the GTO internals perspective though.
Ok, let's see what @skshetry have in mind. I'll play around with this in meantime to see how it can be like.
One question/suggestion: what if open_repo(repo) (or the corresponding as decorator), would ALWAYS return a gitpython repo?
Meaning, that one can remove GitRegistry.from_repo(r).
Basically it would be a generalized GitRegistry.from_repo(r), including the remote repos.
Contra I see: tight binding with gitpython (which is there already, but this would reinforce it).
Basically it would be a generalized GitRegistry.from_repo(r), including the remote repos.
That's what I meant by open_repo, that's how we use in dvc with Repo.open() that always gives you a Repo instance:
https://github.com/iterative/gto/pull/296#discussion_r1010224208
Hi @aguschin ! I am living with a bad conscience about the code I left behind me.
Any decision on which direction to take? Just let me know what you consider best.
Hi @francesco086! Sorry, simply didn't have time to investigate this. Will take a look soon.
Codecov Report
Base: 86.71% // Head: 86.71% // Decreases project coverage by -0.00% :warning:
Coverage data is based on head (
7978a8e) compared to base (c2985fd). Patch coverage: 100.00% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## main #296 +/- ##
==========================================
- Coverage 86.71% 86.71% -0.01%
==========================================
Files 16 16
Lines 2025 2032 +7
==========================================
+ Hits 1756 1762 +6
- Misses 269 270 +1
| Impacted Files | Coverage Ξ | |
|---|---|---|
| gto/cli.py | 72.69% <ΓΈ> (-0.33%) |
:arrow_down: |
| gto/api.py | 93.87% <100.00%> (ΓΈ) |
|
| gto/git_utils.py | 99.30% <100.00%> (+0.03%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Hi @aguschin , first of all I wanted to thank you for the discussion we had, it is really expanding my understandings :)
I pushed a minor change with a simplification of the clone decorator, and renamed it remote_to_local.
Following our discussion I thought "for now we took a functional programming style", let's try to see how it would work if pushed to its consequences.
So I made a little experiment that you can see here: https://github.com/francesco086/gto/blob/pushed_functional_programming/gto/git_utils.py#L62 (check the lines above to understand what happens)
I decompose remote_to_local into building blocks ("extract until you drop"), in a functional programming fashion and using decorators. I find it quite enlightening, and I am quite satisfied to see how it worked :)
Don't know if this will help us, just thought it was interesting enough to share :)
...aaaaaand I think I got it! This is how you can assert that a decorator has been called: https://github.com/francesco086/gto/blob/test_decorator_call/tests/fp_test.py#L17
I don't know what do you think. In my opinion, it was interesting to see, but I am getting the strong impression that python is not really functional programming friendly :/ For each function you need its "core" implementation separated, eventually using the decorator notation (@...) itself would not work, because you need to be able to reference the non-decorated function....
Shall we switch to object oriented (and in this case, context managers)?
Sorry, I'm not that fast π was trying to understand your examples π
- Thanks for thorough explanation about why tests are complex. I checked out your examples and it indeed looks like we lack some simple ways to check that functions are decorated.
- Looks like moving clone logic to GitRegistry.from_repo should be easier to test, since we don't need to test all API methods I believe.
- Your example with taking functional approach to its limits is very interesting indeed. And the last one with testing that the function is decorated - as well. IMHO, at this level of complexity I'd prefer this to be done with context managers. (TBH, from my side this is just the lack of knowledge on how to do it properly in functional style).
- About joining tests in classes. Sure, you can do it like that is that's easier. I hope at some point I'll refactor all the tests to be written in the same fashion - but definitely not today π
TDD according to the London school. If we take the Chicago school
Could you maybe share something for a novice to read on that? Would be great to get some basic knowledge about TDD and these nuances, if they're important :)
Could you maybe share something for a novice to read on that? Would be great to get some basic knowledge about TDD and these nuances, if they're important :)
Sure, I would recommend this one about the difference between London and Chicago school: https://youtu.be/_S5iUf0ANyQ If I don't remember wrong the video also mentions where to learn TDD (Kent Beck book).
I first learned it from Uncle Bob in his video lectures on Clean Code, but don't have a O'Reilly account anymore and can't point to it anymore sry :(
In my stubbornness, I came across at again another "discovery" about decorators. I was wondering: can you reference a non-decorated function? How?
...and when you are stubborn enough to check the python-lang source code, you find out that the reference to the original function (the no_decorator) is done by functools.wraps here, it's __wrapped__. So:
from functools import wraps
def g(f):
@wraps(f)
def wrapped_f(*args, **kwargs):
print("decorated!")
return f(*args, **kwargs)
return wrapped_f
@g
def f():
print("I am f")
f.__wrapped__()
>> I am f
Here we go! I think this is good to know
The other piece I needed for what I wanted to do (actually, the most important), a get_applied_decorators function.
Showcase:
import ast
import inspect
from dataclasses import dataclass
from functools import wraps
from typing import Callable
def g(a):
def wrap(f):
@wraps(f)
def wrapped_f(*args, **kwargs):
return f(*args, **kwargs)
return wrapped_f
return wrap
@g("x")
@g(a="x")
def f(x):
pass
@dataclass
class AppliedDecorator:
f: str
args: list
kwargs: dict
def get_applied_decorators(f: Callable):
o = ast.parse(inspect.getsource(f))
ast_f = [c for c in ast.iter_child_nodes(o) if c.name == f.__name__][0]
return [
AppliedDecorator(f=d.func.id,
args=[a.value for a in d.args],
kwargs={k.arg: k.value.value for k in d.keywords})
for d in ast_f.decorator_list
]
dec = get_applied_decorators(f)
print(dec)
>> [AppliedDecorator(f='g', args=['x'], kwargs={}), AppliedDecorator(f='g', args=[], kwargs={'a': 'x'})]
Unfortunately it requires parsing the source code :( But thanks to the module ast it is actually reasonable. Just quite some boilerplate code. I must say that I find strange that such functionalities are not built-in in python, perhaps in the inspect module...
Hi @aguschin , I started working out the alternative approach, here a draft PR where you can have a look at the changes: https://github.com/francesco086/gto/pull/3
Notice how many rather intrusive changes were required. The most annoying part is init_index_manager, which I only started to address.
Notice also that the same exact changes were required for GitRegistry and EnrichmentManager. To avoid code duplication I introduced FromRemoteRepoMixin, which is basically the equivalent of @clone_on_remote_repo, just less general.
How does it look in your opinion? Do you think it's a promising direction?
I close this one in favor of https://github.com/iterative/gto/pull/309