pandas
pandas copied to clipboard
DOC: Enforce Numpy Docstring Validation | pandas.DataFrame
DOC: Enforce Numpy Docstring Validation (Parent Issue) #58063
Pandas has a script for validating docstrings in code_checks.sh. Currently, some methods fail some of these checks.
pandas.DataFrame
https://github.com/pandas-dev/pandas/blob/c468028f5c2398c04d355cef7a8b6a3952620de2/ci/code_checks.sh#L82-L134
The task is:
-
take 1-5 methods
-
run:
scripts/validate_docstrings.py --format=actions <method-name>
example command: scripts/validate_docstrings.py --format=actions pandas.Categorical.__array__
example output:
################################################################################
################################## Validation ##################################
################################################################################
2 Errors found for `pandas.Categorical.__array__`:
ES01 No extended summary found
SA01 See Also section not found
-
check if validation docstrings passes for those methods, and if it’s necessary fix the docstrings according to whatever error is reported. Note: We've chosen to ignore ES01 errors, these are not required to be fixed.
-
remove those methods from code_checks.sh if all errors are cleared and the docstring is correct, otherwise, remove the specific error that was fixed from the list of errors for that method.
-
commit, push, open pull request
Please don't comment take
as multiple people can work on this issue. You also don't need to ask for permission to work on this, just comment on which methods are you going to work : )
If you're new contributor, please check the contributing guide
thanks @datapythonista for the inspiration for this issue!
opened a fix for pandas.DataFrame.where
Going to remove pandas.DataFrame.swapaxes
, pandas.DataFrame.pad
and pandas.DataFrame.backfill
which are all deprecated
Will work on pandas.DataFrame.unstack
, pandas.DataFrame.value_counts
and pandas.DataFrame.tz_localize
Continue with pandas.DataFrame.to_period
,pandas.DataFrame.to_timestamp
,pandas.DataFrame.tz_convert
Working on pandas.DataFrame.assign
, pandas.DataFrame.bfill
, and pandas.DataFrame.ffill
Will work on pandas.DataFrame.get
and pandas.DataFrame.dtypes
Will work on pandas.DataFrame.copy
, pandas.DataFrame.first_valid_index
, pandas.DataFrame.last_valid_index
, and pandas.DataFrame.keys
Working on pandas.DataFrame.sparse
, pandas.DataFrame.sparse.density
, pandas.DataFrame.sparse.from_spmatrix
, pandas.DataFrame.sparse.to_coo
, and pandas.DataFrame.sparse.to_dense
Working on:
pandas.DataFrame.columns
pandas.DataFrame.pop
working on:
pandas.DataFrame.to_feather
working on
pandas.DataFrame.mean
pandas.DataFrame.median
pandas.DataFrame.plot
pandas.DataFrame.pop
working on
pandas.DataFrame.__iter__
pandas.DataFrame.columns
pandas.DataFrame.droplevel
working on
pandas.DataFrame.max
pandas.DataFrame.min
Working on
-i "pandas.DataFrame.hist RT03" \
-i "pandas.DataFrame.infer_objects RT03" \
-i "pandas.DataFrame.reorder_levels SA01" \
-i "pandas.DataFrame.to_parquet RT03" \
Awesome! Thanks for all the work on these, I've been a bit busy lately but I'll open some new issues for the remaining ones
@jordan-d-murphy @mroeschke We need to reopen this issue. Following are still remaining
-i "pandas.DataFrame.__dataframe__ SA01" \
-i "pandas.DataFrame.at_time PR01" \
-i "pandas.DataFrame.kurt RT03,SA01" \
-i "pandas.DataFrame.kurtosis RT03,SA01" \
-i "pandas.DataFrame.max RT03" \
-i "pandas.DataFrame.mean RT03,SA01" \
-i "pandas.DataFrame.median RT03,SA01" \
-i "pandas.DataFrame.min RT03" \
-i "pandas.DataFrame.plot PR02,SA01" \
-i "pandas.DataFrame.prod RT03" \
-i "pandas.DataFrame.product RT03" \
-i "pandas.DataFrame.sem PR01,RT03,SA01" \
-i "pandas.DataFrame.skew RT03,SA01" \
-i "pandas.DataFrame.sparse PR01" \
-i "pandas.DataFrame.std PR01,RT03,SA01" \
-i "pandas.DataFrame.sum RT03" \
-i "pandas.DataFrame.swaplevel SA01" \
-i "pandas.DataFrame.to_markdown SA01" \
-i "pandas.DataFrame.var PR01,RT03,SA01" \
I am working on
-i "pandas.DataFrame.__dataframe__ SA01" \
-i "pandas.DataFrame.at_time PR01" \
-i "pandas.DataFrame.kurt RT03,SA01" \
-i "pandas.DataFrame.kurtosis RT03,SA01" \
I am working on the following
-i "pandas.DataFrame.prod RT03" \
-i "pandas.DataFrame.product RT03" \
-i "pandas.DataFrame.sem PR01,RT03,SA01" \
-i "pandas.DataFrame.skew RT03,SA01" \
-i "pandas.DataFrame.sparse PR01" \
working on
-i "pandas.DataFrame.std PR01,RT03,SA01" \
-i "pandas.DataFrame.sum RT03" \
-i "pandas.DataFrame.swaplevel SA01" \
-i "pandas.DataFrame.to_markdown SA01" \
-i "pandas.DataFrame.var PR01,RT03,SA01" \
I will check out:
-i "pandas.DataFrame.max RT03" \
-i "pandas.DataFrame.mean RT03,SA01" \
-i "pandas.DataFrame.median RT03,SA01" \
-i "pandas.DataFrame.min RT03" \
I'll check out:
-i "pandas.DataFrame.plot PR02,SA01" \
@mroeschke, @Aloqeely is there anything else I need to add?
Hey, is this issue open? would like contribute as a beginner. Thank you
Yes it's still open. Good luck!
Hey, I'd like to contribute to this issue if it's still open
Yes! You can see ci/code_checks.sh
for all the docstrings that need to be fixed.
I will work on -i "pandas.DataFrame.value_counts RT03" \
,
-i "pandas.DataFrame.var PR01,RT03,SA01" \
,
-i "pandas.DataFrame.where RT03" \
, -i "pandas.DataFrame.backfill PR01,SA01" \
and
-i "pandas.DataFrame.bfill SA01" \
Is it only that the methods from the original issue post need to be checked? Or any methods in ci/code_checks.sh
? Also when i check a few of the methods, I get additional errors that aren't listed in the documentation? Do I add those error codes too?
You can fix any method in that file.
Also when i check a few of the methods, I get additional errors that aren't listed in the documentation?
Not sure what you mean by additional errors.
Does the validate_docstrings.py validation not work for anyone else? I'm getting an error on line 217, stating that there isn't enough values to unpack (expected 4 got 1) for all of the docstrings I've checked.
It works for me. Can you post the command you ran and the error message?