pandas
pandas copied to clipboard
DOC: fix docstring validation errors for pandas.Series
follow up on issues #56804, #59458 and #58063 pandas has a script for validating docstrings:
https://github.com/pandas-dev/pandas/blob/0cdc6a48302ba1592b8825868de403ff9b0ea2a5/ci/code_checks.sh#L155-L187
Currently, some methods fail docstring validation check. The task here is:
- take 2-4 methods
- run:
scripts/validate_docstrings.py <method-name> - fix the docstrings according to whatever error is reported
- remove those methods from
code_checks.shscript - commit, push, open pull request
Example:
scripts/validate_docstrings.py pandas.Series.prod
pandas.Timestamp.tz_localize fails with the SA01 error
################################################################################
################################## Validation ##################################
################################################################################
2 Errors found for `pandas.Series.prod`:
ES01 No extended summary found
RT03 Return value has no description
Please don't comment take as multiple people can work on this issue. You also don't need to ask for permission to work on this, just comment on which methods are you going to work.
If you're new contributor, please check the contributing guide
I'll take these:
-i "pandas.Series.sparse.fill_value SA01" \
-i "pandas.Series.sparse.from_coo PR07,SA01" \
-i "pandas.Series.sparse.npoints SA01" \
-i "pandas.Series.sparse.sp_values SA01" \
-i "pandas.Series.sparse.to_coo PR07,RT03,SA01" \
I'll take these:
-i "pandas.Series.str.wrap RT03,SA01" \
-i "pandas.Series.str.zfill RT03" \
Working on these:
-i "pandas.Series.str.match RT03" \
-i "pandas.Series.str.normalize RT03,SA01" \
-i "pandas.Series.str.repeat SA01" \
-i "pandas.Series.str.replace SA01" \
I'll take these:
-i "pandas.Series.struct.dtypes SA01" \
-i "pandas.Series.to_markdown SA01" \
Here's a filtered list of pandas.Series docstring issues that still need to be addressed:
...
-i "pandas.Series.dt.as_unit PR01,PR02" \
...
-i "pandas.Series.dt.round PR01,PR02" \
...
-i "pandas.Series.dt.unit GL08" \
...
-i "pandas.Series.pad PR01,SA01" \
...
I went ahead and removed methods that were already claimed/addressed by open + merged PRs. (Last updated 9/2/2024)
I'll take these:
-i "pandas.Series.pop SA01" \
-i "pandas.Series.list.__getitem__ SA01" \
-i "pandas.Series.list.flatten SA01" \
-i "pandas.Series.list.len SA01" \
-i "pandas.Series.reorder_levels RT03,SA01" \
-i "pandas.Series.sparse.density SA01" \
-i "pandas.Series.gt SA01" \
-i "pandas.Series.lt SA01" \
-i "pandas.Series.ne SA01" \
-i "pandas.Series.prod RT03" \
-i "pandas.Series.product RT03" \
I will take
-i "pandas.Series.dt.strftime PR01,PR02" \
-i "pandas.Series.dt.to_period PR01,PR02" \
-i "pandas.Series.dt.total_seconds PR01" \
-i "pandas.Series.dt.tz_convert PR01,PR02" \
-i "pandas.Series.dt.tz_localize PR01,PR02" \
-i "pandas.Series.dt.unit GL08" \
I'll take
-i "pandas.Series.std PR01,RT03,SA01" \
-i "pandas.Series.sem PR01,RT03,SA01" \
I followed the instructions and encountered this issue: I added 'See Also' to the function fill_value(self) in ./pandas/core/arrays/sparse/array.py. After running the command python3 scripts/validate_docstrings.py pandas.Series.sparse.fill_value, I received the message:
thang123456@MSI:/mnt/c/Users/ADMIN/Desktop/pandas/pandas$ python3 scripts/validate_docstrings.py pandas.Series.sparse.fill_value
################################################################################ ################# Docstring (pandas.Series.sparse.fill_value) ################# ################################################################################
Elements in data that are fill_value are not stored.
For memory savings, this should be the most common value in the array.
Examples
ser = pd.Series([0, 0, 2, 2, 2], dtype="Sparse[int]") ser.sparse.fill_value 0 spa_dtype = pd.SparseDtype(dtype=np.int32, fill_value=2) ser = pd.Series([0, 0, 2, 2, 2], dtype=spa_dtype) ser.sparse.fill_value 2
################################################################################ ################################## Validation ################################## ################################################################################
1 Errors found for pandas.Series.sparse.fill_value:
SA01 See Also section not found
I checked very carefully but still couldn't fix the error. Can someone help me understand what is going wrong?
I will take:
-i "pandas.Series.dt.floor PR01,PR02" \
-i "pandas.Series.dt.ceil PR01,PR02" \
I'll take these:
-i "pandas.Series.sparse PR01,SA01" \
-i "pandas.Series.sparse.to_coo PR07,RT03,SA01" \
I'll take these:
-i "pandas.Series.dt.normalize PR01" \
-i "pandas.Series.dt.qyear GL08" \
@Tmthang1601 The pandas prefix is not needed for SparseDtype and SparseArray. Remove that prefix and the validation command should pass.
See Also
--------
SparseDtype : Dtype for sparse array.
SparseArray : Array of sparse data.
@Tmthang1601 The
pandasprefix is not needed forSparseDtypeandSparseArray. Remove that prefix and the validation command should pass.See Also -------- SparseDtype : Dtype for sparse array. SparseArray : Array of sparse data.
@hlakams
Originally there was no line "See Also
SparseDtype : Dtype for sparse array. SparseArray : Array of sparse data." in the String Docs of the def fill_value function, I added it by mistake for the purpose of no more errors, I didn't think after I removed it it would go away, and I tried, of course it didn't go away
@Tmthang1601 Can you push up your changes in a new PR?
@hlakams According to the instructions, you need to complete 2 to 4 methods and run the script successfully before pushing to a new PR, but I'm having trouble.
@Tmthang1601 I'm not sure what the issue is, but try replacing lines 620:639 from https://github.com/pandas-dev/pandas/issues/59592#issuecomment-2311939867 with the following docstring:
"""
Elements in `data` that are `fill_value` are not stored.
For memory savings, this should be the most common value in the array.
See Also
--------
SparseDtype : Dtype for sparse array.
SparseArray : Array of sparse data.
Examples
--------
>>> ser = pd.Series([0, 0, 2, 2, 2], dtype="Sparse[int]")
>>> ser.sparse.fill_value
0
>>> spa_dtype = pd.SparseDtype(dtype=np.int32, fill_value=2)
>>> ser = pd.Series([0, 0, 2, 2, 2], dtype=spa_dtype)
>>> ser.sparse.fill_value
2
"""
Run pre-commit once this change from https://github.com/pandas-dev/pandas/issues/59592#issuecomment-2313880479 is committed (assuming it was configured correctly) + address possible lint errors and you should be able to push up to your fork.
I will take these:
-i "pandas.Series.dt.day_name PR01,PR02" \
-i "pandas.Series.dt.month_name PR01,PR02" \
I will take this - -i "pandas.Series.update PR07,SA01" \
I'll work on this:
-i "pandas.Series.str.swapcase RT03" \
I'll work on these:
-i "pandas.Series.dt.nanoseconds SA01" \\
-i "pandas.Series.dt.seconds SA01"
I'll work on this:
-i "pandas.Series.str.swapcase RT03" \
it seems that pandas.Series.str.swapcase has already been done.
Sorry, I am a first time contributor. May I know how to check whether something is done or not? I searched for the keyword "swapcase" on this page and didn't see anyone was working on this. @chalky25
Welcome to contributing, @blackhole-hoop. I also started three days ago.
I'm also not sure who fixed it or how it got fixed — because none of the merged commits mention it.
That said, I just tested it using the following command:
scripts/validate_docstrings.py pandas.Series.str.swapcase
And, I got the following in the result.
################################################################################
#################### Docstring (pandas.Series.str.swapcase) ####################
################################################################################
Convert strings in the Series/Index to be swapcased.
Equivalent to :meth:`str.swapcase`.
Returns
-------
Series or Index of objects
A Series or Index where the strings are modified by :meth:`str.swapcase`.
See Also
--------
Series.str.lower : Converts all characters to lowercase.
Series.str.upper : Converts all characters to uppercase.
Series.str.title : Converts first character of each word to uppercase and
remaining to lowercase.
Series.str.capitalize : Converts first character to uppercase and
remaining to lowercase.
Series.str.swapcase : Converts uppercase to lowercase and lowercase to
uppercase.
Series.str.casefold: Removes all case distinctions in the string.
Examples
--------
>>> s = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
>>> s
0 lower
1 CAPITALS
2 this is a sentence
3 SwApCaSe
dtype: object
>>> s.str.lower()
0 lower
1 capitals
2 this is a sentence
3 swapcase
dtype: object
>>> s.str.upper()
0 LOWER
1 CAPITALS
2 THIS IS A SENTENCE
3 SWAPCASE
dtype: object
>>> s.str.title()
0 Lower
1 Capitals
2 This Is A Sentence
3 Swapcase
dtype: object
>>> s.str.capitalize()
0 Lower
1 Capitals
2 This is a sentence
3 Swapcase
dtype: object
>>> s.str.swapcase()
0 LOWER
1 capitals
2 THIS IS A SENTENCE
3 sWaPcAsE
dtype: object
################################################################################
################################## Validation ##################################
################################################################################
Docstring for "pandas.Series.str.swapcase" correct. :)
In other words, you use the script given by the original poster to check the docstring.
i will work on this
- "pandas.Series.str.lower RT03" \
- "pandas.Series.str.center RT03,SA01" \
- "pandas.Series.str.title RT03" \
- "pandas.Series.str.lstrip RT03" \
i started contributing found some are already solve without mentioning.
I try to run some code that already merge they also showing error like
python scripts/validate_docstrings.py pandas.Series.str.swapcase
Result
################################################################################
#################### Docstring (pandas.Series.str.swapcase) ####################
################################################################################
Convert strings in the Series/Index to be swapcased.
Equivalent to :meth:`str.swapcase`.
Returns
-------
Series or Index of object
See Also
--------
Series.str.lower : Converts all characters to lowercase.
Series.str.upper : Converts all characters to uppercase.
Series.str.title : Converts first character of each word to uppercase and
remaining to lowercase.
Series.str.capitalize : Converts first character to uppercase and
remaining to lowercase.
Series.str.swapcase : Converts uppercase to lowercase and lowercase to
uppercase.
Series.str.casefold: Removes all case distinctions in the string.
Examples
--------
>>> s = pd.Series(['lower', 'CAPITALS', 'this is a sentence', 'SwApCaSe'])
>>> s
0 lower
1 CAPITALS
2 this is a sentence
3 SwApCaSe
dtype: object
>>> s.str.lower()
0 lower
1 capitals
2 this is a sentence
3 swapcase
dtype: object
>>> s.str.upper()
0 LOWER
1 CAPITALS
2 THIS IS A SENTENCE
3 SWAPCASE
dtype: object
>>> s.str.title()
0 Lower
1 Capitals
2 This Is A Sentence
3 Swapcase
dtype: object
>>> s.str.capitalize()
0 Lower
1 Capitals
2 This is a sentence
3 Swapcase
dtype: object
>>> s.str.swapcase()
0 LOWER
1 capitals
2 THIS IS A SENTENCE
3 sWaPcAsE
dtype: object
################################################################################
################################## Validation ##################################
################################################################################
1 Errors found for `pandas.Series.str.swapcase`:
RT03 Return value has no description
and in code.sh file there is no pandas.String.str. related code line is all str related doc fixed
I'll take these:
-i "pandas.Series.str.rjust RT03,SA01" \
-i "pandas.Series.str.rpartition RT03" \
-i "pandas.Series.str.rstrip RT03" \
I want to work on these issues:
-i "pandas.Series.sparse.sp_values SA01,ES01" \
-i "pandas.Series.str.match ES01" \
Hello! I am new to the pandas community. It seems like most of these are already taken. Is there any way to filter which methods have been run already?
/Assign