python-fire
python-fire copied to clipboard
parsing function help with multiline
Can I have flexible line breaks in the help comments?
this is the help doc
def func(input_table: str):
"""Plot region mutation info using table generated by `bioat bam mpileup_to_table`.
:param input_table: Path of table generated by `bioat bam mpileup_to_table`.
This table should contain base mutation information from a short genome region (no more than 1k nt).
printed help I got
POSITIONAL ARGUMENTS
INPUT_TABLE
Type: str
Path of table generated by `bioat bam mpileup_to_table`. This table should contain base mutation information from a short genome region (no more than 1k nt).
printed help I expacted
POSITIONAL ARGUMENTS
INPUT_TABLE
Type: str
Path of table generated by `bioat bam mpileup_to_table`.
This table should contain base mutation information from a short genome region (no more than 1k nt).
The docstring parsing code is in https://github.com/google/python-fire/blob/master/fire/docstrings.py
Multiline rst args should be handled by https://github.com/google/python-fire/blob/d44d33d4ac9389854b046ca0270c112693b309e6/fire/docstrings.py#L468-L469
You may have identified a bug.
I want to work on this. First time contributor here.
Was looking into the docstrings.py
file and I think the problem is with _join_lines()
. I kinda made a quick fix but I will dive deeper. Some guidance would be awesome!
https://github.com/thebadcoder96/python-fire-os/blob/45e3a0e7be335b33fd83b24b06ba8fdc1baee884/fire/docstrings.py#L271-L278
What do you guys think?
The first step would be to add one or more failing test cases, e.g. inspired from the comment above https://github.com/google/python-fire/issues/448#issue-1659522480 Then if you see the issue and can fix it, that would be a welcome change!
The test cases would go in https://github.com/google/python-fire/blob/master/fire/docstrings_test.py
Notice at line 253 there's "# TODO(dbieber): Add parameters for variations in whitespace handling." I don't have this code loaded into memory -- do you happen to see what the issue would be with always using \n to join lines? I wonder why I wanted multiple variations for whitespace handling from the beginning.
One possibility that comes to mind is that the first line might be shorter than subsequent lines because it includes the arg name as a prefix, and so we'd get weird formatting if we were to simply keep all the newlines as is (but strip that prefix).
But other times, keeping newlines is key, e.g. if there's ascii art in the docstring or other spacing-specific content.
I recreated the issue on my end and then resolved it with the code I mentioned in the comment above . I don't think this is the real fix rather a quick fix.
do you happen to see what the issue would be with always using \n to join lines?
I was thinking along the same lines as you were and I was using \n to join lines always. But when I ran the docstring tests, 3 of them failed. The tests were test_google_format_multiline_arg_description
, test_numpy_format_multiline_arg_description
, and test_numpy_format_typed_args_and_returns
.
Here is one of the failed tests:
======================================================================
FAIL: test_numpy_format_multiline_arg_description (__main__.DocstringsTest.test_numpy_format_multiline_arg_description)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/mishals/Documents/GitHub Open Source Repos/python-fire-os/fire/docstrings_test.py", line 263, in test_numpy_format_multiline_arg_description
self.assertEqual(expected_docstring_info, docstring_info)
AssertionError: Docst[314 chars]o cover two lines.')], returns=None, yields=None, raises=None) != Docst[314 chars]o cover two\nlines.')], returns=None, yields=None, raises=None)
I think that using \n to join always is a good use. In fact, I would argue that the test is wrong. The test docstring is
Docstring summary.
This is a longer description of the docstring. It spans across multiple
lines.
Parameters
----------
param1 : int
The first parameter.
param2 : str
The second parameter. This has a lot of text, enough to cover two
lines.
But the test expects this output:
NAME
test.py - Docstring summary.
SYNOPSIS
test.py PARAM1 PARAM2
DESCRIPTION
This is a longer description of the docstring. It spans across multiple
lines.
POSITIONAL ARGUMENTS
PARAM1
Type: int
The first parameter.
PARAM2
Type: str
The second parameter. This has a lot of text, enough to cover two lines.
NOTES
You can also use flags syntax for POSITIONAL ARGUMENTS
Notice how the second parameter does not split into the next line?
If we use \n to join, in my opinion, it would bring in the new line like how it is in the docstring. I tried looking into it further and found that when we do not use \n, the lines are stripped and stored as one long string. If we do use line breaks, then the code would split it into 2 or more different strings/lines which while joining back should use \n.
I was running late working on it last night and wanted to update you guys with something so I made the quick fix. I think the best way to go about it would be to always use \n to join lines like you said and edit the 3 test cases to pass only when the new line is present. Also, I could add the test case for this issue as you mentioned.
I am not 100% sure since I do not know all the intricacies of the docstring and also this is my first time contributing ever. What do you think of everything mentioned above? also can you assign this issue to me?
Btw, the other 2 tests that failed also fail the same way.
I agree that it would be OK to adjust the behavior of those 3 tests.
I do have one concern about simply always using \n, and that's that it might lead to poor formatting in some simple cases. In particular suppose (1) the arg name is long, (2) the arg description starts on the same line as the arg name, and (3) the arg description spans a few lines. Then the first line of the extracted arg description might be really short.
Maybe this is okay. I wonder if there's an approach that gets the best of both approaches though. Wdyt?
I will work on updating the tests. As far as your concerns, this is how I think it would work; Please correct me if I am wrong.
(1) the arg name is long, (2) the arg description starts on the same line as the arg name,
Right now, the code extracts the docstring and splits by \n. so if the arg name is long or the description starts on the same line, it would be extracted as a really long string. And when we are printing it out, it would print out that long string which would preserve the original docstring. So if we are joining the lines always by \n this would not change since the line is just one huge string.
and (3) the arg description spans a few lines.
in this case, the docstring is extracted and split by \n so every line is a string inside a list. the code right now would join them using ' ' which does not preserve the original docstring. joining by \n would create those line breaks.
I may be missing something. But I think that \n always is the way to go here and we would not have any issues.
I wonder if there's an approach that gets the best of both approaches though. Wdyt?
if you plan to expand your docstring capabilities to handle more nuances, I think one way to go about it is what I did for the quick fix. I did it initially to pass the docstring test but I think we can use it. :P
I created a new parameter for _join_lines()
called type
(bad name) that would possibility label the what part of docstring (type of lines) needs to be joined. We can then accordingly add more functionality as the need arises. What do you think?
so if the arg name is long or the description starts on the same line, it would be extracted as a really long string.
See the part of the logic where line_info.remaining
is set. If I'm remembering right, this will exclude the arg name, leaving just the part of the line that follows. If the arg name is long, the remaining part of the line will be quite short.
Edit: ignore this comment. line_info.remaining is still logically a whole line. I thought it was only the part after the arg, but I was wrong. The part after the arg is named "second" in the code e.g. at line 397 as linked in the next comment, and (from a cursory glance) seems to only apply to the Google docstring format.
It looks like for the RST format, whole lines are used are used so my concern would not arise.
For the Google format, the concern I have arises at https://github.com/google/python-fire/blob/d44d33d4ac9389854b046ca0270c112693b309e6/fire/docstrings.py#L397
For numpy, if I'm thinking about this correctly, whole lines are used so my concern would not arise.
Can you give me an example of where your concern is happening? I'm slightly confused about the part where you said it would arise. I tried to use the google docstring test string to test it out and tried some of my variations, making the string long and spanning multiple lines, but it seems to work.
Please review the changes proposed in the PR below. if you can give me a test case where your concern will be caused, it will help me understand it better and fix it.
👍 Here's an example:
def make_function_maker_handler_event(function_maker_handler_event_factory):
"""This is a Google-style docstring with long args.
Args:
function_maker_handler_event_factory: The function-maker-handler event factory
responsible for making the function-maker-handler event. Use this whenever
you need a function-maker-handler event made for you programmatically.
"""
I haven't double-checked this, but I expect if we use the \n approach this will show up in the help as:
The function-maker-handler event factory
responsible for making the function-maker-handler event. Use this whenever
you need a function-maker-handler event made for you programmatically.
with the first line awkwardly short.
True! but my question then would be how would we want to handle it?
I was thinking we could set a certain number of letters that would determine if the arg name is long and then if len(arg) > our number, we can join the first and second line of the description but wouldn't it be too long if we joined it with the next line?
For example:
The function-maker-handler event factory responsible for making the function-maker-handler event. Use this whenever
you need a function-maker-handler event made for you programmatically.
Merry Christmas!
Merry Christmas indeed!
how would we want to handle it?
I haven't been able to think of a perfect solution. More thoughts below:
I was thinking we could set a certain number of letters that would determine if the arg name is long and then if len(arg) > our number, we can join the first and second line of the description but wouldn't it be too long if we joined it with the next line?
This seems reasonable to me. Let's think this through a little more. Does the content of the second line matter for deciding whether to join the first line to the second? I think it might: if the second line is a blank line, maybe we don't join them. (My reasoning here is that if the second line is blank, then the blank line is likely deliberate formatting from the docstring author.)
Also what if the second line is reallly long? Then maybe we don't join then as well. (My reasoning here is that if the second line is really long, (a) the formatting might be intentional, e.g. (b) the docstring author clearly isn't putting text on the next line (line 3) simply because a line (line 2) gets too long, so that's probably not what happened w/ line 1.)
Maybe the heuristic is:
- If the arg length (measured by the position of the ":") is more than 50% of the "apparent max line length"
- AND if the line-1 length is long enough to suggest it isn't intentionally short
- AND the line-2 length is long enough to suggest it isn't intentionally short
- AND the line-2 length is short enough to suggest it isn't intentionally long Then we merge line 1 into line 2. Otherwise we just do the standard "\n".join approach.
Continuing to brainstorm:
- What is the "apparent max line length"? Maybe the length of the longest line, rounded up to the nearest 10? (Standard max line lengths are 80 and 120 characters.)
- How do we decide if a line is "long enough to suggest it isn't intentionally short"? For this I think we have to look at the next line. If the first word on line N+1 would have fit comfortably onto line N, that suggests line N is intentionally short. A blank line is always considered intentionally short. (A line that's followed by a blank line is also always considered intentionally short.)
- How do we decide if a line is "short enough to suggest it isn't intentionally long"? I think anything less than the 105% of the "apparent max line length" should count as "short enough to suggest it isn't intentionally long".
What do you think of this approach? Too complex? Seem reasonable?
Yup, I also was thinking along the same lines, we need a max line length.
If the arg length (measured by the position of the ":") is more than 50% of the "apparent max line length" AND if the line-1 length is long enough to suggest it isn't intentionally short AND the line-2 length is long enough to suggest it isn't intentionally short AND the line-2 length is short enough to suggest it isn't intentionally long Then we merge line 1 into line 2. Otherwise we just do the standard "\n".join approach.
This logic seems reasonable to me. For the example you had provided earlier, it would merge the lines right? that would make the merged line too long; is that okay if our output exceeds the max line length?
What is the "apparent max line length"? Maybe the length of the longest line, rounded up to the nearest 10? (Standard max line lengths are 80 and 120 characters.)
Since this is just with Google format, is there a max line limit Google says we should use? I am not sure if this is official but here it says 80.
I like this approach but I am not sure if this is a perfect solution. I will start working on this tomorrow!
Yes, Google style uses 80 character as the max line length. Fire's goal is to work well with any Python software though (within reason), so I'd be inclined to do something a little more general than forcing 80 for the Google-style docstrings.
Sorry, I got a little busy with some other stuff. I was reviewing the heuristic again and I noticed
def make_function_maker_handler_event(function_maker_handler_event_factory):
"""This is a Google-style docstring with long args.
Args:
function_maker_handler_event_factory: The function-maker-handler event factory
responsible for making the function-maker-handler event. Use this whenever
you need a function-maker-handler event made for you programmatically.
"""
For this example, the logic we agreed on will give out the output like this:
The function-maker-handler event factory
responsible for making the function-maker-handler event. Use this whenever
you need a function-maker-handler event made for you programmatically.
The first word of line2 can easily fit in line1 so it would think that line1 is intentionally short. If the arg length is more than 50% of "max line length" then the first word of line2 would ALWAYS fit comfortably on line1.
I think we can just do:
If the arg length (measured by the position of the ":") is more than 50% of the "apparent max line length" ~AND if the line-1 length is long enough to suggest it isn't intentionally short~ AND the line-2 length is long enough to suggest it isn't intentionally short AND the line-2 length is short enough to suggest it isn't intentionally long Then we merge line 1 into line 2. Otherwise we just do the standard "\n".join approach.
Does that make sense? Also we could do something like if line1 length + arg length is hitting our "max line length" then we can check if line2 isn't intentionally short or long and then merge line 1 and line 2.
I'm trying to think more about it. Do we need to check if line2 isn't intentionally short? My reasoning is that even if line2 is short, merging line1 and line2 would probably not matter in most use cases...
I think we should only check if line2 isn't blank and isn't intentionally long. Wdyt?
The first word of line2 can easily fit in line1 so it would think that line1 is intentionally short.
This check would need to take place before removing the arg. So line1 == " function_maker_handler_event_factory: The function-maker-handler event factory"
-> and so the first word of line2 cannot easily fit in line1, and so line1 would not get marked as intentionally short.
Sorry, I got a little busy with some other stuff.
No worries! We move at a glacial pace in developing Fire these days :) 🧊.
Can you direct me where I would need to implement the checks? I was thinking of doing it after extracting the arg but now I am thinking we should do it right after the lines are created in the parse()
function. Am I treading in the right direction?
No worries! We move at a glacial pace in developing Fire these days :) 🧊.
I'm surprised you are replying during the holidays 🤩 Don't forget to enjoy them!
First, I think the heuristic might warrant a slight addition:
If the arg length (measured by the position of the ":") is more than 50% of the "apparent max line length" AND if the line-1 length is long enough to suggest it isn't intentionally short AND if the line-1 length is short enough to suggest it isn't intentionally long AND the line-2 length is long enough to suggest it isn't intentionally short AND the line-2 length is short enough to suggest it isn't intentionally long Then we merge line 1 into line 2. Otherwise we just do the standard "\n".join approach.
My thinking for this addition is: if the first line is really long already, it doesn't make sense to merge it onto the next line.
Here's my overall view of the heuristic: Each of the "intentionally short/long" checks is really just trying to determine if the docstring author was deliberate about their formatting; if they were, let them keep it (i.e. apply '\n'.join). If it's just a few lines of text with default formatting though, let's try to reformat it to be be better (i.e. apply " ".join while preserving blank lines).
Can you direct me where I would need to implement the checks?
Haven't had a chance to properly figure this out yet, but thought I'd just list out my thoughts anyway in case that's helpful.
Right now iirc the algorithm works by consuming one line at a time. If we're in the args section of a google-style docstring, this is done via _consume_google_args_line. We need to adjust the algorithm such that:
- by the end of the section, we can report an apparent max line length
- by the time we've consumed line-2(*or maybe the whole section), we can answer whether line1_intentionally_short
- by the time we've consumed line-2*, we can answer whether line1_intentionally_long
- by the time we've consumed line-3*, we can answer whether line2_intentionally_long
- by the time we've consumed line-3*, we can answer whether line2_intentionally_long
To determine these, we'll need to maintain some state. One possible way to do this would be to maintain the following state:
- line1_length
- line2_first_word_length
- line2_length
- line3_first_word_length
- max_line_length
As we consume lines, we populate these four fields. Then once the full section is consumed we can calculate:
apparent_max_line_length = roundup(max_line_length, 10) # made up function; not a real thing yet
line1_intentionally_short = (line1_length + line2_first_word_length) < apparent_max_line_length
line2_intentionally_short = (line2_length + line3_first_word_length) < apparent_max_line_length
line1_intentionally_long = line1_length > 1.05 * apparent_max_line_length
line2_intentionally_long = line2_length > 1.05 * apparent_max_line_length
So, where do the changes actually go?
We define all the state we're going to track up front here: https://github.com/google/python-fire/blob/d44d33d4ac9389854b046ca0270c112693b309e6/fire/docstrings.py#L166-L179
Most of the changes would likely take place in _consume_google_args_line.
It's not obvious to me where to put the post-section processing (i.e. computing apparent_max_line_length, line1_intentionally_short, etc.); that's going to take a bit more thought. If you have ideas, would love to hear them!
Ah, might have spoke too soon about where to define the state. Looks like the state for Google-style args is actually defined here, not with the rest of the parser state: https://github.com/google/python-fire/blob/d44d33d4ac9389854b046ca0270c112693b309e6/fire/docstrings.py#L295-L298
Cool, I think I understand the heuristic. I was looking at _consume_google_args_line
and it seems like we are doing line_info.remaining.split(':', 1)
and then calling the _get_or_create_arg_by_name function where the arg states are defined.
Since we need the true whole line to capture line1_length, line2_first_word_length, and others; these fields should be captured from line_info.line
right?
I'm deciding on how to identify the lines. Right now, I am thinking I can
- use the if statement below to identify line1,
- capture everything we want about line1 from line_info.line, and then also save line1,
- Then check if the other lines have their
previous
key value == line1 since that line would be line2. - line3 would be
next
key value of line2
https://github.com/google/python-fire/blob/d44d33d4ac9389854b046ca0270c112693b309e6/fire/docstrings.py#L390-L394
I feel like this might not be the best way to identify lines. Thoughts?
It's not obvious to me where to put the post-section processing (i.e. computing apparent_max_line_length, line1_intentionally_short, etc.); that's going to take a bit more thought. If you have ideas, would love to hear them!
I was thinking we could do this at the end of the _consume_google_args_line
function and update the description (merge the lines or leave them as is.)
Please review PR #476.
I decided to define the states up front where you had initially mentioned. This is because _get_or_create_arg_by_name
(where the google args are defined) is called only when line1 is consumed and then arg class is saved to state.current_arg
.
I was thinking we could do this at the end of the _consume_google_args_line function and update the description (merge the lines or leave them as is.)
I was wrong about this. The post-processing is done after the lines are consumed.
I did refine the heuristic a little bit handling null values where I think they could occur (mainly if line2 or line3 are null). The code passes all the test cases.
I was running pylint and pytest and all my lines of code are too long for pylint 😅. I also encountered an issue while running pytest with asyncio. I can fix that issue as well. It shouldn't be that hard.
Please review PR #476.
Took a skim and left two comments. Will come back to it later.
Thank you! I am working on it; I have a question for you tho.
function_maker_handler_event_factory1: The function-maker-handler event factory
if we take the above example, the line length is 81 so rounding up to 90 would make the arg_length (39) not meet our criteria of arg_length being longer than 50% of line_length. Is that what we want? I'm second-guessing rounding up now. maybe we should round closer to 80 or 120.
I do not know if this is just a case I made up since most likely authors will follow the docstring guidelines
Another thing I noticed:
Args:
function_maker_handler_event_factory: The function-maker-handler event factory
responsible for making the function-maker-handler event. Use this whenever
you need a function-maker-handler event made for you programmatically.
param2_maker_handler_event_factoryas: The second parameter. This has a lot of
responsibility making the function-maker-handler event. Use this whenever
you need a function-maker-handler event made for you programmatically.
When I add an enter or new line between the 2 parameters, I get the below as the output:
POSITIONAL ARGUMENTS
**FUNCTION_MAKER_HANDLER_EVENT_FACTORY**
The function-maker-handler event factory responsible for making the function-maker-handler event. Use this whenever you need a function-maker-handler event made for you programmatically.
**PARAM2_MAKER_HANDLER_EVENT_FACTORYAS**
The second parameter. This has a lot of responsible for making the function-maker-handler event. Use this whenever
you need a function-maker-handler event made for you programmatically.
It merges all the lines for the first argument. I am not sure why this is happening. Also, idk if this is against the google docstring rules?
if we take the above example, the line length is 81 so rounding up to 90 would make the arg_length (39) not meet our criteria of arg_length being longer than 50% of line_length.
Good point. Maybe we adjust the threshold to 40%. I don't think it's critical; the resulting text is going to be formatted a bit wonkily either way. The important thing is the threshold is less than 70-80% because that's where it becomes really clear that merging is better than not.
It merges all the lines for the first argument. I am not sure why this is happening.
Sounds like a bug that will require a closer look at your code to investigate. (Not doing that just this second but can take a look later if still needed then.)
Also, idk if this is against the google docstring rules?
iirc I don't think new lines are recommended in the google style guide, but in my opinion we should aim to handle this reasonably anyway. If it was working prior to this change, we should aim to ensure it stays working. If it already wasn't working, we can fix it as a separate change, it doesn't have to be part of this change.
Cool will make the threshold 40%.
If it was working prior to this change, we should aim to ensure it stays working. If it already wasn't working, we can fix it as a separate change, it doesn't have to be part of this change.
Before this change, all the lines were just joined with ' '.join
so it would not be working properly. This is a new bug that comes with this change and I can look into this when I have some extra time. Do you want me to create this as an issue once we release this change?
I also encountered an issue while running pytest with asyncio. I can fix that issue as well. It shouldn't be that hard.
The other error I encountered with pytest was that @asyncio.coroutine is depreciated. We can use the async
keyword before def instead but this is only compatible with python 3.5+ and Fire supports Python 2.9. This retains to issue #427 and possibly fixed with PR #440
I was thinking we can just do an if sys.version == 2.9
then use @asyncio.coroutine else use the async keyword.
I do not have much idea about asyncio and how it is being utilized for Fire (besides that it is used for testing 😂), but I came across this as one of the main differences between them.
Merged https://github.com/google/python-fire/pull/440 (🔲 going forward we will want to change the naming though / possibly reintroduce w/ the guard as you suggest)
Awesome! I am adding some test cases and I noticed that when the max length
is 90 and the line1 + first_word_line2
is 83, it happens to take it as intentionally short. I have used the previously defined roundup()
to give some grace for these cases.
Another test case led to adding a check for if line3 is an arg or not (when the description is 2 lines or less.) to further refine the code.
Please review #476