pandas icon indicating copy to clipboard operation
pandas copied to clipboard

DOC: fix docstring validation errors for pandas.Series

Open natmokval opened this issue 1 year ago • 18 comments

follow up on issues #56804, #59458 and #58063 pandas has a script for validating docstrings:

https://github.com/pandas-dev/pandas/blob/0cdc6a48302ba1592b8825868de403ff9b0ea2a5/ci/code_checks.sh#L155-L187

Currently, some methods fail docstring validation check. The task here is:

  • take 2-4 methods
  • run: scripts/validate_docstrings.py <method-name>
  • fix the docstrings according to whatever error is reported
  • remove those methods from code_checks.sh script
  • commit, push, open pull request

Example:

scripts/validate_docstrings.py pandas.Series.prod

pandas.Timestamp.tz_localize fails with the SA01 error

################################################################################
################################## Validation ##################################
################################################################################

2 Errors found for `pandas.Series.prod`:
        ES01    No extended summary found
        RT03    Return value has no description

Please don't comment take as multiple people can work on this issue. You also don't need to ask for permission to work on this, just comment on which methods are you going to work.

If you're new contributor, please check the contributing guide

natmokval avatar Aug 24 '24 09:08 natmokval

I'll take these:

 -i "pandas.Series.sparse.fill_value SA01" \ 
 -i "pandas.Series.sparse.from_coo PR07,SA01" \ 
 -i "pandas.Series.sparse.npoints SA01" \ 
 -i "pandas.Series.sparse.sp_values SA01" \ 
 -i "pandas.Series.sparse.to_coo PR07,RT03,SA01" \ 

ivonastojanovic avatar Aug 24 '24 12:08 ivonastojanovic

I'll take these:

 -i "pandas.Series.str.wrap RT03,SA01" \ 
 -i "pandas.Series.str.zfill RT03" \ 

wenchen-cai avatar Aug 24 '24 17:08 wenchen-cai

Working on these:

 -i "pandas.Series.str.match RT03" \ 
 -i "pandas.Series.str.normalize RT03,SA01" \ 
 -i "pandas.Series.str.repeat SA01" \ 
 -i "pandas.Series.str.replace SA01" \ 

ivonastojanovic avatar Aug 24 '24 17:08 ivonastojanovic

I'll take these:

-i "pandas.Series.struct.dtypes SA01" \ 
-i "pandas.Series.to_markdown SA01" \ 

githubalexliu avatar Aug 25 '24 03:08 githubalexliu

Here's a filtered list of pandas.Series docstring issues that still need to be addressed:

        ...
        -i "pandas.Series.dt.as_unit PR01,PR02" \
        ...
        -i "pandas.Series.dt.round PR01,PR02" \
        ...
        -i "pandas.Series.dt.unit GL08" \
        ...
        -i "pandas.Series.pad PR01,SA01" \
        ...

I went ahead and removed methods that were already claimed/addressed by open + merged PRs. (Last updated 9/2/2024)

hlakams avatar Aug 25 '24 11:08 hlakams

I'll take these:

 -i "pandas.Series.pop SA01" \
 -i "pandas.Series.list.__getitem__ SA01" \
 -i "pandas.Series.list.flatten SA01" \
 -i "pandas.Series.list.len SA01" \
 -i "pandas.Series.reorder_levels RT03,SA01" \
 -i "pandas.Series.sparse.density SA01" \
 -i "pandas.Series.gt SA01" \
 -i "pandas.Series.lt SA01" \
 -i "pandas.Series.ne SA01" \
 -i "pandas.Series.prod RT03" \
 -i "pandas.Series.product RT03" \

hlakams avatar Aug 25 '24 12:08 hlakams

I will take

-i "pandas.Series.dt.strftime PR01,PR02" \
        -i "pandas.Series.dt.to_period PR01,PR02" \
        -i "pandas.Series.dt.total_seconds PR01" \
        -i "pandas.Series.dt.tz_convert PR01,PR02" \
        -i "pandas.Series.dt.tz_localize PR01,PR02" \
        -i "pandas.Series.dt.unit GL08" \

Pranav-Wadhwa avatar Aug 26 '24 21:08 Pranav-Wadhwa

I'll take

 -i "pandas.Series.std PR01,RT03,SA01" \ 
 -i "pandas.Series.sem PR01,RT03,SA01" \

james-magee avatar Aug 27 '24 02:08 james-magee

I followed the instructions and encountered this issue: I added 'See Also' to the function fill_value(self) in ./pandas/core/arrays/sparse/array.py. After running the command python3 scripts/validate_docstrings.py pandas.Series.sparse.fill_value, I received the message:

thang123456@MSI:/mnt/c/Users/ADMIN/Desktop/pandas/pandas$ python3 scripts/validate_docstrings.py pandas.Series.sparse.fill_value

################################################################################ ################# Docstring (pandas.Series.sparse.fill_value) ################# ################################################################################

Elements in data that are fill_value are not stored.

For memory savings, this should be the most common value in the array.

Examples

ser = pd.Series([0, 0, 2, 2, 2], dtype="Sparse[int]") ser.sparse.fill_value 0 spa_dtype = pd.SparseDtype(dtype=np.int32, fill_value=2) ser = pd.Series([0, 0, 2, 2, 2], dtype=spa_dtype) ser.sparse.fill_value 2

################################################################################ ################################## Validation ################################## ################################################################################

1 Errors found for pandas.Series.sparse.fill_value: SA01 See Also section not found I checked very carefully but still couldn't fix the error. Can someone help me understand what is going wrong?

image

Tmthang1601 avatar Aug 27 '24 08:08 Tmthang1601

I will take:

-i "pandas.Series.dt.floor PR01,PR02" \
-i "pandas.Series.dt.ceil PR01,PR02" \

Gesare5 avatar Aug 27 '24 18:08 Gesare5

I'll take these:

-i "pandas.Series.sparse PR01,SA01" \
-i "pandas.Series.sparse.to_coo PR07,RT03,SA01" \

pol-rius avatar Aug 27 '24 21:08 pol-rius

I'll take these:

-i "pandas.Series.dt.normalize PR01" \
-i "pandas.Series.dt.qyear GL08" \

githubalexliu avatar Aug 27 '24 22:08 githubalexliu

@Tmthang1601 The pandas prefix is not needed for SparseDtype and SparseArray. Remove that prefix and the validation command should pass.

See Also
--------
SparseDtype : Dtype for sparse array.
SparseArray : Array of sparse data.

hlakams avatar Aug 28 '24 00:08 hlakams

@Tmthang1601 The pandas prefix is not needed for SparseDtype and SparseArray. Remove that prefix and the validation command should pass.

See Also
--------
SparseDtype : Dtype for sparse array.
SparseArray : Array of sparse data.

@hlakams
Originally there was no line "See Also

SparseDtype : Dtype for sparse array. SparseArray : Array of sparse data." in the String Docs of the def fill_value function, I added it by mistake for the purpose of no more errors, I didn't think after I removed it it would go away, and I tried, of course it didn't go away

Tmthang1601 avatar Aug 28 '24 01:08 Tmthang1601

@Tmthang1601 Can you push up your changes in a new PR?

hlakams avatar Aug 28 '24 01:08 hlakams

@hlakams According to the instructions, you need to complete 2 to 4 methods and run the script successfully before pushing to a new PR, but I'm having trouble.

Tmthang1601 avatar Aug 28 '24 01:08 Tmthang1601

@Tmthang1601 I'm not sure what the issue is, but try replacing lines 620:639 from https://github.com/pandas-dev/pandas/issues/59592#issuecomment-2311939867 with the following docstring:

        """
        Elements in `data` that are `fill_value` are not stored.

        For memory savings, this should be the most common value in the array.

        See Also
        --------
        SparseDtype : Dtype for sparse array.
        SparseArray : Array of sparse data.

        Examples
        --------
        >>> ser = pd.Series([0, 0, 2, 2, 2], dtype="Sparse[int]")
        >>> ser.sparse.fill_value
        0
        >>> spa_dtype = pd.SparseDtype(dtype=np.int32, fill_value=2)
        >>> ser = pd.Series([0, 0, 2, 2, 2], dtype=spa_dtype)
        >>> ser.sparse.fill_value
        2
        """

Run pre-commit once this change from https://github.com/pandas-dev/pandas/issues/59592#issuecomment-2313880479 is committed (assuming it was configured correctly) + address possible lint errors and you should be able to push up to your fork.

hlakams avatar Aug 28 '24 02:08 hlakams

I will take these:

        -i "pandas.Series.dt.day_name PR01,PR02" \
        -i "pandas.Series.dt.month_name PR01,PR02" \

yinglyu avatar Sep 03 '24 03:09 yinglyu

I will take this - -i "pandas.Series.update PR07,SA01" \

doshi-kevin avatar Sep 03 '24 18:09 doshi-kevin

I'll work on this:

-i "pandas.Series.str.swapcase RT03" \

blackhole-hoop avatar Sep 06 '24 19:09 blackhole-hoop

I'll work on these: -i "pandas.Series.dt.nanoseconds SA01" \\ -i "pandas.Series.dt.seconds SA01"

chalky25 avatar Sep 06 '24 19:09 chalky25

I'll work on this:

-i "pandas.Series.str.swapcase RT03" \

it seems that pandas.Series.str.swapcase has already been done.

chalky25 avatar Sep 06 '24 19:09 chalky25

Sorry, I am a first time contributor. May I know how to check whether something is done or not? I searched for the keyword "swapcase" on this page and didn't see anyone was working on this. @chalky25

blackhole-hoop avatar Sep 06 '24 20:09 blackhole-hoop

Welcome to contributing, @blackhole-hoop. I also started three days ago.

I'm also not sure who fixed it or how it got fixed — because none of the merged commits mention it.

That said, I just tested it using the following command:

scripts/validate_docstrings.py pandas.Series.str.swapcase

And, I got the following in the result.


################################################################################
#################### Docstring (pandas.Series.str.swapcase) ####################
################################################################################

Convert strings in the Series/Index to be swapcased.

Equivalent to :meth:`str.swapcase`.

Returns
-------
Series or Index of objects
    A Series or Index where the strings are modified by :meth:`str.swapcase`.

See Also
--------
Series.str.lower : Converts all characters to lowercase.
Series.str.upper : Converts all characters to uppercase.
Series.str.title : Converts first character of each word to uppercase and
    remaining to lowercase.
Series.str.capitalize : Converts first character to uppercase and
    remaining to lowercase.
Series.str.swapcase : Converts uppercase to lowercase and lowercase to
    uppercase.
Series.str.casefold: Removes all case distinctions in the string.

Examples
--------
>>> s = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
>>> s
0                 lower
1              CAPITALS
2    this is a sentence
3              SwApCaSe
dtype: object

>>> s.str.lower()
0                 lower
1              capitals
2    this is a sentence
3              swapcase
dtype: object

>>> s.str.upper()
0                 LOWER
1              CAPITALS
2    THIS IS A SENTENCE
3              SWAPCASE
dtype: object

>>> s.str.title()
0                 Lower
1              Capitals
2    This Is A Sentence
3              Swapcase
dtype: object

>>> s.str.capitalize()
0                 Lower
1              Capitals
2    This is a sentence
3              Swapcase
dtype: object

>>> s.str.swapcase()
0                 LOWER
1              capitals
2    THIS IS A SENTENCE
3              sWaPcAsE
dtype: object

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Series.str.swapcase" correct. :)

In other words, you use the script given by the original poster to check the docstring.

ammar-qazi avatar Sep 06 '24 21:09 ammar-qazi

i will work on this

  • "pandas.Series.str.lower RT03" \
  • "pandas.Series.str.center RT03,SA01" \
  • "pandas.Series.str.title RT03" \
  • "pandas.Series.str.lstrip RT03" \

pratik305 avatar Sep 13 '24 04:09 pratik305

i started contributing found some are already solve without mentioning. I try to run some code that already merge they also showing error like python scripts/validate_docstrings.py pandas.Series.str.swapcase

Result

################################################################################
#################### Docstring (pandas.Series.str.swapcase) ####################
################################################################################

Convert strings in the Series/Index to be swapcased.

Equivalent to :meth:`str.swapcase`.

Returns
-------
Series or Index of object

See Also
--------
Series.str.lower : Converts all characters to lowercase.
Series.str.upper : Converts all characters to uppercase.
Series.str.title : Converts first character of each word to uppercase and
    remaining to lowercase.
Series.str.capitalize : Converts first character to uppercase and
    remaining to lowercase.
Series.str.swapcase : Converts uppercase to lowercase and lowercase to
    uppercase.
Series.str.casefold: Removes all case distinctions in the string.

Examples
--------
>>> s = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
>>> s
0                 lower
1              CAPITALS
2    this is a sentence
3              SwApCaSe
dtype: object

>>> s.str.lower()
0                 lower
1              capitals
2    this is a sentence
3              swapcase
dtype: object

>>> s.str.upper()
0                 LOWER
1              CAPITALS
2    THIS IS A SENTENCE
3              SWAPCASE
dtype: object

>>> s.str.title()
0                 Lower
1              Capitals
2    This Is A Sentence
3              Swapcase
dtype: object

>>> s.str.capitalize()
0                 Lower
1              Capitals
2    This is a sentence
3              Swapcase
dtype: object

>>> s.str.swapcase()
0                 LOWER
1              capitals
2    THIS IS A SENTENCE
3              sWaPcAsE
dtype: object

################################################################################
################################## Validation ##################################
################################################################################

1 Errors found for `pandas.Series.str.swapcase`:
        RT03    Return value has no description

and in code.sh file there is no pandas.String.str. related code line is all str related doc fixed

pratik305 avatar Sep 13 '24 07:09 pratik305

I'll take these:

 -i "pandas.Series.str.rjust RT03,SA01" \ 
 -i "pandas.Series.str.rpartition RT03" \ 
 -i "pandas.Series.str.rstrip RT03" \ 

mysticshirou avatar Sep 17 '24 17:09 mysticshirou

I want to work on these issues:

-i "pandas.Series.sparse.sp_values SA01,ES01" \ -i "pandas.Series.str.match ES01" \

syeda-fajar avatar Sep 22 '24 12:09 syeda-fajar

Hello! I am new to the pandas community. It seems like most of these are already taken. Is there any way to filter which methods have been run already?

dhelms33 avatar Sep 26 '24 00:09 dhelms33

/Assign

techie505 avatar Oct 22 '24 05:10 techie505