gto icon indicating copy to clipboard operation
gto copied to clipboard

WIP: Refactor decorators in git utils

Open francesco086 opened this issue 3 years ago β€’ 1 comments

Contributes to https://github.com/iterative/gto/issues/25

francesco086 avatar Oct 26 '22 16:10 francesco086

Hi @aguschin , here a first draft of what I am changing, for now just worked on the cloning decorator. I am writing few points to guide you:

  • in git_utils:
    • changed the signature of the cloning decorator. It needs: a. the repo b. a controller to decide when to clone c. arguments to feed the controller with
    • renamed @clone_on_remote_repo -> @clone, I think that its arguments make very clear what happens
    • introduced _check_required_args_are_provided, that can be re-used in the other decorators
  • in test_git_utils:
    • having all the tests in a single file is really painful, I tried at least to cluster them in classes. I still think that separate files would have been clearer, but I don't want to refactor all tests
  • in api:
    • update the decorations
  • in test_api
    • just check that the decorator is applied to all the commands that needs it to support remote repos

If you think that this change looks good, I can apply the same to all other decorators.

francesco086 avatar Oct 26 '22 16:10 francesco086

On a second thought, I believe that passing the controller and its arguments to the cloning decorator is too much. I should have stuck to the original idea. Also, perhaps, clone_if_remote instead of clone is necessary to make the code readable...? (I ended up including the controller in the arguments to keep the name short, because I still found it not explicit enough).

However, while working it out, I realized that all decorators in the git_utils module have the same structure:

  1. they use some arguments given to f, and they should check if they were indeed provided
  2. there is a if condition (what I called controller), if false run f as usual, otherwise do something else

Probably I could abstract it away to avoid code repetition. What do you think?

francesco086 avatar Nov 01 '22 08:11 francesco086

Thanks for great summary @francesco086. It helped a lot. WDYT about @skshetry suggestion above?

aguschin avatar Nov 01 '22 09:11 aguschin

At the beginning I liked very much the idea of the decorator because it is not intrusive (I didn't need to change the functions in api). If I remember right, this was the reason why we decided to go for decorators, we didn't want to add a with statement in each gto command.

However, I found quite painful to work with decorators, especially for what concerns the tests. But perhaps it comes from my inexperience with them.

In the end I don't know what is best :D I would let you decide

francesco086 avatar Nov 01 '22 11:11 francesco086

PS: instead of with open_repo(repo, commit=commit, push=push) as r: ... one could write a decorator as @open_repo(repo_arg="repo", commit_arg="commit", push_arg="push").

I'm not very convinced about the name open_repo though, but also in this case I would leave the decision to you.

francesco086 avatar Nov 01 '22 11:11 francesco086

we didn't want to add a with statement in each gto command

Yes, this was one of the reasons as I remember it. Besides CLI commands, Studio will have to refactor a lot of code, if using with will be the only option to work with local repos. Maybe using with for remote repos and keeping an option to don't use with for local ones? Not sure will it look good from the GTO internals perspective though.

Ok, let's see what @skshetry have in mind. I'll play around with this in meantime to see how it can be like.

aguschin avatar Nov 01 '22 11:11 aguschin

One question/suggestion: what if open_repo(repo) (or the corresponding as decorator), would ALWAYS return a gitpython repo? Meaning, that one can remove GitRegistry.from_repo(r).

Basically it would be a generalized GitRegistry.from_repo(r), including the remote repos.

Contra I see: tight binding with gitpython (which is there already, but this would reinforce it).

francesco086 avatar Nov 01 '22 12:11 francesco086

Basically it would be a generalized GitRegistry.from_repo(r), including the remote repos.

That's what I meant by open_repo, that's how we use in dvc with Repo.open() that always gives you a Repo instance: https://github.com/iterative/gto/pull/296#discussion_r1010224208

skshetry avatar Nov 01 '22 12:11 skshetry

Hi @aguschin ! I am living with a bad conscience about the code I left behind me.

Any decision on which direction to take? Just let me know what you consider best.

francesco086 avatar Nov 04 '22 12:11 francesco086

Hi @francesco086! Sorry, simply didn't have time to investigate this. Will take a look soon.

aguschin avatar Nov 07 '22 11:11 aguschin

Codecov Report

Base: 86.71% // Head: 86.71% // Decreases project coverage by -0.00% :warning:

Coverage data is based on head (7978a8e) compared to base (c2985fd). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #296      +/-   ##
==========================================
- Coverage   86.71%   86.71%   -0.01%     
==========================================
  Files          16       16              
  Lines        2025     2032       +7     
==========================================
+ Hits         1756     1762       +6     
- Misses        269      270       +1     
Impacted Files Coverage Ξ”
gto/cli.py 72.69% <ΓΈ> (-0.33%) :arrow_down:
gto/api.py 93.87% <100.00%> (ΓΈ)
gto/git_utils.py 99.30% <100.00%> (+0.03%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov-commenter avatar Nov 10 '22 08:11 codecov-commenter

Hi @aguschin , first of all I wanted to thank you for the discussion we had, it is really expanding my understandings :)

I pushed a minor change with a simplification of the clone decorator, and renamed it remote_to_local.

Following our discussion I thought "for now we took a functional programming style", let's try to see how it would work if pushed to its consequences. So I made a little experiment that you can see here: https://github.com/francesco086/gto/blob/pushed_functional_programming/gto/git_utils.py#L62 (check the lines above to understand what happens) I decompose remote_to_local into building blocks ("extract until you drop"), in a functional programming fashion and using decorators. I find it quite enlightening, and I am quite satisfied to see how it worked :)

Don't know if this will help us, just thought it was interesting enough to share :)

francesco086 avatar Nov 10 '22 08:11 francesco086

...aaaaaand I think I got it! This is how you can assert that a decorator has been called: https://github.com/francesco086/gto/blob/test_decorator_call/tests/fp_test.py#L17

I don't know what do you think. In my opinion, it was interesting to see, but I am getting the strong impression that python is not really functional programming friendly :/ For each function you need its "core" implementation separated, eventually using the decorator notation (@...) itself would not work, because you need to be able to reference the non-decorated function....

Shall we switch to object oriented (and in this case, context managers)?

francesco086 avatar Nov 10 '22 09:11 francesco086

Sorry, I'm not that fast πŸ˜‚ was trying to understand your examples πŸ˜…

  1. Thanks for thorough explanation about why tests are complex. I checked out your examples and it indeed looks like we lack some simple ways to check that functions are decorated.
  2. Looks like moving clone logic to GitRegistry.from_repo should be easier to test, since we don't need to test all API methods I believe.
  3. Your example with taking functional approach to its limits is very interesting indeed. And the last one with testing that the function is decorated - as well. IMHO, at this level of complexity I'd prefer this to be done with context managers. (TBH, from my side this is just the lack of knowledge on how to do it properly in functional style).
  4. About joining tests in classes. Sure, you can do it like that is that's easier. I hope at some point I'll refactor all the tests to be written in the same fashion - but definitely not today πŸ˜‚

TDD according to the London school. If we take the Chicago school

Could you maybe share something for a novice to read on that? Would be great to get some basic knowledge about TDD and these nuances, if they're important :)

aguschin avatar Nov 10 '22 09:11 aguschin

Could you maybe share something for a novice to read on that? Would be great to get some basic knowledge about TDD and these nuances, if they're important :)

Sure, I would recommend this one about the difference between London and Chicago school: https://youtu.be/_S5iUf0ANyQ If I don't remember wrong the video also mentions where to learn TDD (Kent Beck book).

I first learned it from Uncle Bob in his video lectures on Clean Code, but don't have a O'Reilly account anymore and can't point to it anymore sry :(

francesco086 avatar Nov 10 '22 10:11 francesco086

In my stubbornness, I came across at again another "discovery" about decorators. I was wondering: can you reference a non-decorated function? How?

...and when you are stubborn enough to check the python-lang source code, you find out that the reference to the original function (the no_decorator) is done by functools.wraps here, it's __wrapped__. So:

from functools import wraps

def g(f):
    @wraps(f)
    def wrapped_f(*args, **kwargs):
        print("decorated!")
        return f(*args, **kwargs)

    return wrapped_f

@g
def f():
    print("I am f")

f.__wrapped__()
>> I am f

Here we go! I think this is good to know

francesco086 avatar Nov 12 '22 18:11 francesco086

The other piece I needed for what I wanted to do (actually, the most important), a get_applied_decorators function. Showcase:

import ast
import inspect
from dataclasses import dataclass
from functools import wraps
from typing import Callable


def g(a):
    def wrap(f):
        @wraps(f)
        def wrapped_f(*args, **kwargs):
            return f(*args, **kwargs)
        return wrapped_f
    return wrap


@g("x")
@g(a="x")
def f(x):
    pass


@dataclass
class AppliedDecorator:
    f: str
    args: list
    kwargs: dict

def get_applied_decorators(f: Callable):
    o = ast.parse(inspect.getsource(f))
    ast_f = [c for c in ast.iter_child_nodes(o) if c.name == f.__name__][0]
    return [
        AppliedDecorator(f=d.func.id,
                         args=[a.value for a in d.args],
                         kwargs={k.arg: k.value.value for k in d.keywords})
        for d in ast_f.decorator_list
    ]


dec = get_applied_decorators(f)
print(dec)

>> [AppliedDecorator(f='g', args=['x'], kwargs={}), AppliedDecorator(f='g', args=[], kwargs={'a': 'x'})]

Unfortunately it requires parsing the source code :( But thanks to the module ast it is actually reasonable. Just quite some boilerplate code. I must say that I find strange that such functionalities are not built-in in python, perhaps in the inspect module...

francesco086 avatar Nov 12 '22 21:11 francesco086

Hi @aguschin , I started working out the alternative approach, here a draft PR where you can have a look at the changes: https://github.com/francesco086/gto/pull/3

Notice how many rather intrusive changes were required. The most annoying part is init_index_manager, which I only started to address.

Notice also that the same exact changes were required for GitRegistry and EnrichmentManager. To avoid code duplication I introduced FromRemoteRepoMixin, which is basically the equivalent of @clone_on_remote_repo, just less general.

How does it look in your opinion? Do you think it's a promising direction?

francesco086 avatar Nov 25 '22 09:11 francesco086

I close this one in favor of https://github.com/iterative/gto/pull/309

francesco086 avatar Dec 06 '22 08:12 francesco086