python-fire
python-fire copied to clipboard
Return pandas.DataFrame
Hi team,
It seems that currently fire
cannot return dataframe from function or method calls. I would to like make a contribution to this and create a PR if this would be good feature to fire.
Let me know how you thinks
Cheers Bin
That's odd, this worked in ~2018, I demoed it in a lightning talk at a pydata meetup - https://docs.google.com/presentation/d/1NHAYGN4Fx4YBVcaGAiJ4U3gEkyhhgE643r7Ka2MF2wI/edit?usp=sharing
EDIT:
My script still works, you just need to manually call to_string at the end
$ pd read_csv drive/aws_costs.csv - mean - round - to_string
Hmm I see. I didn't know about the to_string
method for Dataframe.
My Fire component is actually a class which utilize pandas under the hood and it returns a dataframe for most of the methods.
I think it's not intuitive for end-user (i.e. my client and colleague) to know that they need to chain command with - to_string
to render the output properly. Given that dataframe is such a popular object right now, do you think it's worth adding the - to_string
as default behavior to Fire
?
If you look at the stack trace, it's because DataFrames fail on inspect, which is used by fire to generate the list of subcommands.
You reproduce it ouside of fire using:
>>> import pandas
>>> import inspect
>>> df = pandas.read_csv("drive/aws_costs.csv")
>>> inspect.getmembers(df)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/inspect.py", line 342, in getmembers
value = getattr(object, key)
File "/home/nfultz/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 409, in _constructor_expanddim
raise NotImplementedError("Not supported for DataFrames!")
NotImplementedError: Not supported for DataFrames!
So if you make pandas work with inspect, fire will be compatible accordingly, or at least not crash.
OTOH, I'm not sure generating a list of the data frame API is very helpful since it's quite large.
Python Fire should work even when inspect.getmembers doesn't. I'll mark this as a bug.
Fair enough. See also https://github.com/pandas-dev/pandas/issues/31474 , they fixed it but haven't released it yet.
It's also been filed against inspect https://bugs.python.org/issue35108#msg361208
@nfultz I am not familiar with inspect.getmembers
in general, but I do get impressed by how quickly you gather all these information! These information is pretty good for me probably other people as a reference.
Thank you so much for the help!
Would you be able to provide a minimal example reproducing the issue?
Would you be able to provide a minimal example reproducing the issue?
Please pardon me for barging in, but I use Pandas a lot and have really enjoyed using fire for CLIs.
Script
#!/usr/bin/env python3
import fire
import pandas as pd
def test(n: int) -> pd.DataFrame:
return pd.DataFrame(data=range(n))
if __name__ == "__main__":
fire.Fire(test)
Environment
Setting up a Python 3.8 virtual environment, activating it, installing dependencies:
~/tmp ∃ python3 -m venv env
~/tmp ∃ source ./env/bin/activate
(env) ~/tmp ∃ pip install -U pip fire jinja2 pandas -qqq
(env) ~/tmp ∃ python --version
Python 3.8.6
(env) ~/tmp ∃ pip freeze
fire==0.3.1
Jinja2==2.11.2
MarkupSafe==1.1.1
numpy==1.19.2
pandas==1.1.3
python-dateutil==2.8.1
pytz==2020.1
six==1.15.0
termcolor==1.1.0
Running
It looks like our version of Pandas fixes the inspect
-related NotImplementedError
mentioned above; when running the script inside our environment, we get a ValueError
(env) ~/tmp ∃ ./w.py
ERROR: The function received no value for the required argument: n
Usage: w.py N
For detailed information on this command, run:
w.py --help
(env) ~/tmp ∃ ./w.py 10
/usr/local/Cellar/[email protected]/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/inspect.py:350: FutureWarning: _AXIS_NAMES has been deprecated.
value = getattr(object, key)
/usr/local/Cellar/[email protected]/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/inspect.py:350: FutureWarning: _AXIS_NUMBERS has been deprecated.
value = getattr(object, key)
Traceback (most recent call last):
File "./w.py", line 9, in <module>
fire.Fire(test)
File "/Users/graham/tmp/env/lib/python3.8/site-packages/fire/core.py", line 164, in Fire
_PrintResult(component_trace, verbose=component_trace.verbose)
File "/Users/graham/tmp/env/lib/python3.8/site-packages/fire/core.py", line 266, in _PrintResult
help_text = helptext.HelpText(
File "/Users/graham/tmp/env/lib/python3.8/site-packages/fire/helptext.py", line 63, in HelpText
actions_grouped_by_kind = _GetActionsGroupedByKind(component, verbose=verbose)
File "/Users/graham/tmp/env/lib/python3.8/site-packages/fire/helptext.py", line 332, in _GetActionsGroupedByKind
members = completion.VisibleMembers(component, verbose=verbose)
File "/Users/graham/tmp/env/lib/python3.8/site-packages/fire/completion.py", line 365, in VisibleMembers
return [
File "/Users/graham/tmp/env/lib/python3.8/site-packages/fire/completion.py", line 367, in <listcomp>
if MemberVisible(component, member_name, member, class_attrs=class_attrs,
File "/Users/graham/tmp/env/lib/python3.8/site-packages/fire/completion.py", line 311, in MemberVisible
if member in (absolute_import, division, print_function):
File "/Users/graham/tmp/env/lib/python3.8/site-packages/pandas/core/generic.py", line 1329, in __nonzero__
raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Thanks!
Happy to contribute!
If we uninstall the released version of fire and instead install from the current master
commit from GitHub:
(env) ~/tmp ∃ pip uninstall fire
Found existing installation: fire 0.3.1
Uninstalling fire-0.3.1:
Would remove:
/Users/graham/tmp/env/lib/python3.8/site-packages/fire-0.3.1-py3.8.egg-info
/Users/graham/tmp/env/lib/python3.8/site-packages/fire/*
Proceed (y/n)? y
Successfully uninstalled fire-0.3.1
(env) ~/tmp ∃ pip install git+https://github.com/google/python-fire.git@878b8d86f488ef2606cffdf58297dd2781708316 -qqq
it looks like we get dropped into the help information for pd.DataFrame
:
(env) ~/tmp ∃ ./w.py 10
/usr/local/Cellar/[email protected]/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/inspect.py:350: FutureWarning: _AXIS_NAMES has been deprecated.
value = getattr(object, key)
/usr/local/Cellar/[email protected]/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/inspect.py:350: FutureWarning: _AXIS_NUMBERS has been deprecated.
value = getattr(object, key)
NAME
w.py 10 - Two-dimensional, size-mutable, potentially heterogeneous tabular data.
SYNOPSIS
w.py 10 GROUP | COMMAND | VALUE
DESCRIPTION
Data structure also contains labeled axes (rows and columns).
Arithmetic operations align on both row and column labels. Can be
thought of as a dict-like container for Series objects. The primary
pandas data structure.
GROUPS
GROUP is one of the following:
T
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
at
Access a single value for a row/column label pair.
attrs
axes
columns
Immutable Index implementing a monotonic integer range.
dtypes
One-dimensional ndarray with axis labels (including time series).
iat
Access a single value for a row/column pair by integer position.
iloc
Purely integer-location based indexing for selection by position.
index
Immutable Index implementing a monotonic integer range.
loc
Access a group of rows and columns by label(s) or a boolean array.
plot
Make plots of Series or DataFrame.
shape
style
Helps style a DataFrame or Series according to the data with HTML and CSS.
COMMANDS
COMMAND is one of the following:
abs
Return a Series/DataFrame with absolute numeric value of each element.
add
Get Addition of dataframe and other, element-wise (binary operator `add`).
# Several lines omitted
Just tested against the latest commit (3be260e65a0c25d1dbbe1b15eeb0bf13ac7ec38f), still getting the same behavior.
[ping] Just checked against the latest commit (bc43dca), still getting the same behavior.
Thanks for the pings. I can confirm the behavior you're seeing.
The issue now seems to stem from our implementation of "HasCustomStr". https://github.com/google/python-fire/blob/bc43dca6a52ef6914f0e5d3ec76b316d8fdd648b/fire/value_types.py#L63
HasCustomStr
should return whether an object defines a custom "__str__" function. DataFrames clearly do, since str(df)
prints the dataframe nicely. However, HasCustomStr(df) returns False rather than True.
Next step is to figure out why.
Ah,
So if I'm understanding things correctly, here's what's happening.
DataFrames don't redefine __str__, they just redefine __repr__. The default implementation of __str__ is simply to call __repr__.
The fix will be to update our HasCustomStr method to also check for custom a __repr__ implementation. If either is present, we'll return True.
This will switch Fire's behavior from dropping you into the help screen to instead just printing out the DataFrame.
Well, my approach to detecting custom __str__
methods doesn't seem to work quite as simply for detecting custom __repr__
. Depending on the type of the object, the default __repr__
implementation may come from "type" or "method" (where do I find this "method" object"?). This is going to require more poking around.
Also, this heuristic might not be good enough. E.g. OrderedDict redefines __repr__
but shouldn't be treated as a value. Will have to be more thorough to see what other places the heuristic breaks down in.
Extending @genos's example above, here's a quick fix for anyone who lands here prior to a fix
#!/usr/bin/env python3
import functools
import fire
import pandas as pd
def test(n: int) -> pd.DataFrame:
return pd.DataFrame(data=range(n))
if __name__ == "__main__":
@functools.wraps(test)
def _test(*args, **kwargs):
return str(test(*args, **kwargs))
fire.Fire(_test)
(functools.wraps
means that you can still run test -h
)
I am still seeing this rather odd behaviour, also when returning dict
s of pd.DataFrame
s, i.e. the help is shown, no error though after quitting the help. Returning tuples/lists of DataFrames works. A fix for this would be much appreciated!
We're also running into this problem. It would be great if this could get fixed (maybe we can do it if no one else has time).
@jamesowers-ch thanks for the hint! I ended up having to modify your example slightly in order to get it to work with automated tests:
def df_to_string(func):
"""
See https://github.com/google/python-fire/issues/274
for why we need this (for now)
"""
@functools.wraps(func)
def wrapper(*args, **kwargs):
return str(func(*args, **kwargs))
return wrapper
then I can just @df_to_string
all of my methods without having to modify anything about the fire.Fire()
call in __main__
or unittest.main()
Running into this myself. The problem with the solution above is that it now prevents the function from being used programmatically (e.g. you actually want to operate on the dataframe).
I think the best solution here is to have a list of types that are automatically rendered as strings when they are the output of a function. It seems that dict and list already have that behavior; so I think something like this list might already exist in the code. We just need to make it user-modifiable.
For now, here is a generalization of @jamesowers-ch's solution to classes. Although note that this doesn't allow the CLI to call it's own methods, which is probably one of the most frequent cases where you'd want to have programmatic access...
#!/usr/bin/env python3
import fire
import pandas as pd
from functools import wraps
import pprint
class CLI(object):
"""Command Line Interface"""
def foo(self):
return pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]})
def str_output_wrapper(method):
@wraps(method)
def wrapped(*args, **kwargs):
return pprint.pformat(method(*args, **kwargs))
return wrapped
if __name__ == '__main__':
for key, value in CLI.__dict__.items( ):
if not key.startswith('_') and hasattr(value, '__call__' ):
setattr(CLI, key, str_output_wrapper(value))
fire.Fire(CLI)