pyjanitor icon indicating copy to clipboard operation
pyjanitor copied to clipboard

State of the autocompletion of pyjanitor methods using jedi.

Open VPerrollaz opened this issue 4 years ago • 19 comments

Brief Description

I found the completions proposal of pyjanitor methods in the jupyter notebooks a bit difficult to use and wanted to check if it was my install or a general phenomenon. Roughly put I only get the completion proposal of pyjanitor methods if it is a fist step of method chaining. And in a chain I don't get any completion after a pyjanitor method.

Minimally Reproducible Code

See the gist below https://gist.github.com/VPerrollaz/3fcbe3769264a6ed6f38e6a10c464ca5

VPerrollaz avatar May 21 '20 21:05 VPerrollaz

Wow, that is a big bug catch, @VPerrollaz! I have to admit to not knowing much about tab completions here. You seem to have a better grasp surrounding the tooling for testing here. Would you be kind enough to give a short explainer on how tab completion happens with Python?

At the moment, the only thing I can think of that might interfere with tab completion is the use of pandas-flavor to monkey-patch methods onto a dataframe at runtime. But as for how to debug why tab completion behaviour is as you've described in the gist, I'm a bit at a loss.

ericmjl avatar May 22 '20 12:05 ericmjl

  • There is definitely something going on with pandas flavor, there is an issue on the repository about this, but basically since they didn't look at method chaining they were happy about the first completion proposed by jedi in ipython repl/jupyter notebook.
  • Furthermore the pandas-flavor repository doesn't feel very alive at the moment. Maybe because as they explain in the README.md there is now an official pandas API for monkey-patching?
  • So far what I got for the explanation is that in ipython repl/ jupyter notebook jedi is used via the object Interpreter which used the namespace argument for parsing the code and thus it catches the pyjanitor add by monkey-patching.
  • However in method chaining the first methods do not modify the namespace so we don't get anymore proposition for completion after (this part of the argument feels a bit weak I will have to test it I think before being sure)
  • Finally I think when you use Script the parser only look at the time pandas was first imported so the monkey-patching afterward is invisible.

VPerrollaz avatar May 22 '20 13:05 VPerrollaz

Looking again at the jedi documentation and more precisely at the section limitations https://jedi.readthedocs.io/en/latest/docs/features.html#recipes it is clear that by design jedi doesn't look for setattr, which is precisely at the core of how pandas_flavor works so I think that sadly the autocompletion problem is not considered a bug.

VPerrollaz avatar May 22 '20 13:05 VPerrollaz

That's a wonderful catch!

Would you be open to a documentation PR that helps newcomers recognize this issue? We can go over the exact phrasing in the PR discussion.

ericmjl avatar May 22 '20 14:05 ericmjl

In principle no problem, but what kind of information do you want to provide?

VPerrollaz avatar May 22 '20 15:05 VPerrollaz

Many thanks @VPerrollaz! I think the best info to provide someone is something along the lines of "here's what to expect with respect to tab completion" in your IDE.

Example starter that you can use to get going:

Tab completion behaviour may vary from IDE to IDE.
For a historical perspective on this, please check out the following issues on GitHub:

- https://github.com/ericmjl/pyjanitor/issues/660
- <put in the others you've found>

I suspect that should be enough to help newcomers recognize that the inconsistent tab completion behaviour is a bit complicated to work through, and relies on other ecosystem tools working together better.


By the way, about official namespacing - I did think about it. Maybe in the 1.0 release, as this would be a breaking change. An example would be:

df.jn.clean_names().jn.drop_duplicates()

The additional .jn would be a minimal addition that adds explicitness to the method chains, as right now it's muddled up which functions are pandas-native and which are pyjanitor-patched. This would be a breaking change for end-users though, so it needs user community agreement.

ericmjl avatar May 22 '20 16:05 ericmjl

  • Ok just one additional question. Where do you want me to put the info in the doc hierarchy?
  • As for the namespace there might be a way to import in a different way if one want to use jn as in you example, in this way you would not break backward compatibility.
  • I plan to take a look at the jedi/parso codebase to see if there is a way around the problem but given its size it might take some time.

VPerrollaz avatar May 22 '20 19:05 VPerrollaz

Ok just one additional question. Where do you want me to put the info in the doc hierarchy?

My apologies! I should have been a bit clearer on where. I looked through the docs pages, and I think the current logical place is right on the main page. However, if you have a better idea, I'm all ears! Please let me know what you think.

ericmjl avatar May 23 '20 21:05 ericmjl

Before committing anything about autocompletion, which at this point would be mainly about what doesn't work, I have been looking for alternative use of janitor functions which would allow for jedi completions. I have come up with the following which is pretty ugly because of the lambda expressions but it is a start and the autocompletion is very neat. https://gist.github.com/VPerrollaz/6b3c0684cba9d6305c3c10aba103e383

VPerrollaz avatar May 27 '20 20:05 VPerrollaz

Hey @VPerrollaz! I looked at your gist, and it doesn't look ugly at all :smile:. That said, if I am reading the pandas pipe API correctly, all args and kwargs are collected automatically, so you shouldn't need to do lambdas, I think. I'm sure if you do a tab after jn.clean_ you'll get the right thing.

ericmjl avatar May 27 '20 20:05 ericmjl

I can indeed pass the arguments to pipe rather than use lambda but in this way I don't get the autcompletion which was the point. Which was why I was trying to figure out if there is an "automatic" way to produce a higher order function from the janitor functions while keeping the signature alive for autocompletion. Ideally rather than

df.pipe(lambda df: jn.clean_names(df, strip_underscores=True))

we would do

df.pipe(jn.ho.clean_names(strip_underscores=True))

where ho would be a submodule where all the higher order versions live. Of course it is hypothetical at this point, and I'm not sure it is even a good idea.

VPerrollaz avatar May 28 '20 07:05 VPerrollaz

Proof of concept here https://gist.github.com/VPerrollaz/68d3bcaade2ce77b545d94b51ee1f6ba but once again it feels very hacky. The good (?) news being that jedi seems to check the __signature__ attribute in interpreter mode if it exists.

VPerrollaz avatar May 28 '20 15:05 VPerrollaz

Here is a more complete attempt with all relevant functions of janitors.functions.py dynamically "higheroderized". https://gist.github.com/VPerrollaz/30d3bce1f7fc678ffd00a19d8231e3fa

VPerrollaz avatar May 28 '20 17:05 VPerrollaz

jedi doesn't look for setattr, which is precisely at the core of how pandas_flavor

Just a small clarification—the setattr limitation is an issue with pandas directly, not pandas-flavor. Pandas-flavor is simply an alias to Pandas' native extension API. For earlier versions of pandas (<0.23), pandas-flavor backports the extension API, and yes, uses setattr.

It might be worth opening an issue on pandas directly to handle auto-complete on methods registered at runtime using this extension API.

We can try to solve it in pandas-flavor instead, making pandas-flavor a powered-up version of Pandas' extension API. I'd be happy to review a PR there. If you do come up with a solution, I think you'd want this auto-complete in pandas-flavor (or even pandas), not pyjanitor specifically. It would benefit any libraries trying to extend pandas' API.

Zsailer avatar Jun 04 '20 18:06 Zsailer

You're right of course. The reason I mentioned it in pyjanitor repo is actually because it is where I encountered it. An additional reason is that while I understand why there is no autocompletion using jedi in an IDE ( the Script object in Jedi does not look at setattr indeed) I am not clear on why we don't get method chaining autocompletion in ipython/jupyter (which uses the Interpreter object in Jedi which is more dynamic) because the method signature are perfect. (though looking at the code in pandas they rather use the return type "DataFrame" rather than pandas.core.frame.DataFrame so maybe it is something to look into) So I proposed the above workaround using pipe and higherorder functions but I agree that it is not the most desirable solution.

VPerrollaz avatar Jun 06 '20 09:06 VPerrollaz

@Zsailer Looking at the following https://gist.github.com/VPerrollaz/543fc5bd6408b4e30c3165e4d98591a0 it seems that a quick workaround for the problem might be the following change in the module register.pyof pandas_flavor. Note that it is rather ugly since we are basically tricking jedi by falsifying the signature in forwarding the __annotations__ of the method into the __annotations__ of the AccessorMethod object

def register_dataframe_method(method):
    """Register a function as a method attached to the Pandas DataFrame.
    Example
    -------
    .. code-block:: python
        @register_dataframe_method
        def print_column(df, col):
            '''Print the dataframe column given'''
            print(df[col])
    """
    def inner(*args, **kwargs):

        class AccessorMethod(object):


            def __init__(self, pandas_obj):
                self._obj = pandas_obj

            @wraps(method)
            def __call__(self, *args, **kwargs):
                return method(self._obj, *args, **kwargs)

        register_dataframe_accessor(method.__name__)(AccessorMethod)

        return method
    result = inner()
    result.__annotations__ = method.__annotations__
    return result

VPerrollaz avatar Jun 29 '20 15:06 VPerrollaz

Actually rather than just forwarding __annotations__ we might have to create and then forward the whole __signature__but the principle remains the same.

VPerrollaz avatar Jun 29 '20 15:06 VPerrollaz

Sadly jedi seems to sometimes look at this stuff and sometimes not, and so far I am not clear when...

VPerrollaz avatar Jun 30 '20 22:06 VPerrollaz

Oh my, looks like jedi is super complicated, haha. Thank you for taking the time to look at this, @VPerrollaz. I am confident that the hard work will eventually pay off.

ericmjl avatar Jul 01 '20 03:07 ericmjl