pyjanitor
pyjanitor copied to clipboard
State of the autocompletion of pyjanitor methods using jedi.
Brief Description
I found the completions proposal of pyjanitor methods in the jupyter notebooks a bit difficult to use and wanted to check if it was my install or a general phenomenon. Roughly put I only get the completion proposal of pyjanitor methods if it is a fist step of method chaining. And in a chain I don't get any completion after a pyjanitor method.
Minimally Reproducible Code
See the gist below https://gist.github.com/VPerrollaz/3fcbe3769264a6ed6f38e6a10c464ca5
Wow, that is a big bug catch, @VPerrollaz! I have to admit to not knowing much about tab completions here. You seem to have a better grasp surrounding the tooling for testing here. Would you be kind enough to give a short explainer on how tab completion happens with Python?
At the moment, the only thing I can think of that might interfere with tab completion is the use of pandas-flavor to monkey-patch methods onto a dataframe at runtime. But as for how to debug why tab completion behaviour is as you've described in the gist, I'm a bit at a loss.
- There is definitely something going on with pandas flavor, there is an issue on the repository about this, but basically since they didn't look at method chaining they were happy about the first completion proposed by jedi in ipython repl/jupyter notebook.
- Furthermore the pandas-flavor repository doesn't feel very alive at the moment. Maybe because as they explain in the README.md there is now an official pandas API for monkey-patching?
- So far what I got for the explanation is that in ipython repl/ jupyter notebook jedi is used via the object Interpreter which used the namespace argument for parsing the code and thus it catches the pyjanitor add by monkey-patching.
- However in method chaining the first methods do not modify the namespace so we don't get anymore proposition for completion after (this part of the argument feels a bit weak I will have to test it I think before being sure)
- Finally I think when you use Script the parser only look at the time pandas was first imported so the monkey-patching afterward is invisible.
Looking again at the jedi documentation and more precisely at the section limitations https://jedi.readthedocs.io/en/latest/docs/features.html#recipes it is clear that by design jedi doesn't look for setattr, which is precisely at the core of how pandas_flavor works so I think that sadly the autocompletion problem is not considered a bug.
That's a wonderful catch!
Would you be open to a documentation PR that helps newcomers recognize this issue? We can go over the exact phrasing in the PR discussion.
In principle no problem, but what kind of information do you want to provide?
Many thanks @VPerrollaz! I think the best info to provide someone is something along the lines of "here's what to expect with respect to tab completion" in your IDE.
Example starter that you can use to get going:
Tab completion behaviour may vary from IDE to IDE.
For a historical perspective on this, please check out the following issues on GitHub:
- https://github.com/ericmjl/pyjanitor/issues/660
- <put in the others you've found>
I suspect that should be enough to help newcomers recognize that the inconsistent tab completion behaviour is a bit complicated to work through, and relies on other ecosystem tools working together better.
By the way, about official namespacing - I did think about it. Maybe in the 1.0 release, as this would be a breaking change. An example would be:
df.jn.clean_names().jn.drop_duplicates()
The additional .jn
would be a minimal addition that adds explicitness to the method chains, as right now it's muddled up which functions are pandas-native and which are pyjanitor-patched. This would be a breaking change for end-users though, so it needs user community agreement.
- Ok just one additional question. Where do you want me to put the info in the doc hierarchy?
- As for the namespace there might be a way to import in a different way if one want to use jn as in you example, in this way you would not break backward compatibility.
- I plan to take a look at the jedi/parso codebase to see if there is a way around the problem but given its size it might take some time.
Ok just one additional question. Where do you want me to put the info in the doc hierarchy?
My apologies! I should have been a bit clearer on where. I looked through the docs pages, and I think the current logical place is right on the main page. However, if you have a better idea, I'm all ears! Please let me know what you think.
Before committing anything about autocompletion, which at this point would be mainly about what doesn't work, I have been looking for alternative use of janitor functions which would allow for jedi completions. I have come up with the following which is pretty ugly because of the lambda expressions but it is a start and the autocompletion is very neat. https://gist.github.com/VPerrollaz/6b3c0684cba9d6305c3c10aba103e383
Hey @VPerrollaz! I looked at your gist, and it doesn't look ugly at all :smile:. That said, if I am reading the pandas pipe
API correctly, all args and kwargs are collected automatically, so you shouldn't need to do lambdas, I think. I'm sure if you do a tab after jn.clean_
you'll get the right thing.
I can indeed pass the arguments to pipe rather than use lambda but in this way I don't get the autcompletion which was the point. Which was why I was trying to figure out if there is an "automatic" way to produce a higher order function from the janitor functions while keeping the signature alive for autocompletion. Ideally rather than
df.pipe(lambda df: jn.clean_names(df, strip_underscores=True))
we would do
df.pipe(jn.ho.clean_names(strip_underscores=True))
where ho
would be a submodule where all the higher order versions live.
Of course it is hypothetical at this point, and I'm not sure it is even a good idea.
Proof of concept here
https://gist.github.com/VPerrollaz/68d3bcaade2ce77b545d94b51ee1f6ba
but once again it feels very hacky. The good (?) news being that jedi seems to check the __signature__
attribute in interpreter mode if it exists.
Here is a more complete attempt with all relevant functions of janitors.functions.py
dynamically "higheroderized".
https://gist.github.com/VPerrollaz/30d3bce1f7fc678ffd00a19d8231e3fa
jedi doesn't look for setattr, which is precisely at the core of how pandas_flavor
Just a small clarification—the setattr
limitation is an issue with pandas directly, not pandas-flavor. Pandas-flavor is simply an alias to Pandas' native extension API. For earlier versions of pandas (<0.23), pandas-flavor backports the extension API, and yes, uses setattr
.
It might be worth opening an issue on pandas directly to handle auto-complete on methods registered at runtime using this extension API.
We can try to solve it in pandas-flavor instead, making pandas-flavor a powered-up version of Pandas' extension API. I'd be happy to review a PR there. If you do come up with a solution, I think you'd want this auto-complete in pandas-flavor (or even pandas), not pyjanitor specifically. It would benefit any libraries trying to extend pandas' API.
You're right of course. The reason I mentioned it in pyjanitor repo is actually because it is where I encountered it. An additional reason is that while I understand why there is no autocompletion using jedi in an IDE ( the Script object in Jedi does not look at setattr indeed) I am not clear on why we don't get method chaining autocompletion in ipython/jupyter (which uses the Interpreter object in Jedi which is more dynamic) because the method signature are perfect. (though looking at the code in pandas they rather use the return type "DataFrame" rather than pandas.core.frame.DataFrame so maybe it is something to look into) So I proposed the above workaround using pipe and higherorder functions but I agree that it is not the most desirable solution.
@Zsailer
Looking at the following
https://gist.github.com/VPerrollaz/543fc5bd6408b4e30c3165e4d98591a0
it seems that a quick workaround for the problem might be the following change in the module register.py
of pandas_flavor
. Note that it is rather ugly since we are basically tricking jedi
by falsifying the signature in forwarding the __annotations__
of the method into the __annotations__
of the AccessorMethod
object
def register_dataframe_method(method):
"""Register a function as a method attached to the Pandas DataFrame.
Example
-------
.. code-block:: python
@register_dataframe_method
def print_column(df, col):
'''Print the dataframe column given'''
print(df[col])
"""
def inner(*args, **kwargs):
class AccessorMethod(object):
def __init__(self, pandas_obj):
self._obj = pandas_obj
@wraps(method)
def __call__(self, *args, **kwargs):
return method(self._obj, *args, **kwargs)
register_dataframe_accessor(method.__name__)(AccessorMethod)
return method
result = inner()
result.__annotations__ = method.__annotations__
return result
Actually rather than just forwarding __annotations__
we might have to create and then forward the whole __signature__
but the principle remains the same.
Sadly jedi
seems to sometimes look at this stuff and sometimes not, and so far I am not clear when...
Oh my, looks like jedi
is super complicated, haha. Thank you for taking the time to look at this, @VPerrollaz. I am confident that the hard work will eventually pay off.