pandas icon indicating copy to clipboard operation
pandas copied to clipboard

DOC: Enforce Numpy Docstring Validation | pandas.DataFrame

Open jordan-d-murphy opened this issue 10 months ago • 30 comments

DOC: Enforce Numpy Docstring Validation (Parent Issue) #58063

Pandas has a script for validating docstrings in code_checks.sh. Currently, some methods fail some of these checks.

pandas.DataFrame

https://github.com/pandas-dev/pandas/blob/c468028f5c2398c04d355cef7a8b6a3952620de2/ci/code_checks.sh#L82-L134

The task is:

  1. take 1-5 methods

  2. run: scripts/validate_docstrings.py --format=actions <method-name>

example command: scripts/validate_docstrings.py --format=actions pandas.Categorical.__array__ example output:

################################################################################
################################## Validation ##################################
################################################################################

2 Errors found for `pandas.Categorical.__array__`:
	ES01	No extended summary found
	SA01	See Also section not found
  1. check if validation docstrings passes for those methods, and if it’s necessary fix the docstrings according to whatever error is reported. Note: We've chosen to ignore ES01 errors, these are not required to be fixed.

  2. remove those methods from code_checks.sh if all errors are cleared and the docstring is correct, otherwise, remove the specific error that was fixed from the list of errors for that method.

  3. commit, push, open pull request

Please don't comment take as multiple people can work on this issue. You also don't need to ask for permission to work on this, just comment on which methods are you going to work : )

If you're new contributor, please check the contributing guide

thanks @datapythonista for the inspiration for this issue!

jordan-d-murphy avatar Mar 29 '24 07:03 jordan-d-murphy

opened a fix for pandas.DataFrame.where

YashpalAhlawat avatar Mar 29 '24 18:03 YashpalAhlawat

Going to remove pandas.DataFrame.swapaxes, pandas.DataFrame.pad and pandas.DataFrame.backfill which are all deprecated

Aloqeely avatar Mar 31 '24 00:03 Aloqeely

Will work on pandas.DataFrame.unstack, pandas.DataFrame.value_counts and pandas.DataFrame.tz_localize

bergnerjonas avatar Mar 31 '24 21:03 bergnerjonas

Continue with pandas.DataFrame.to_period ,pandas.DataFrame.to_timestamp ,pandas.DataFrame.tz_convert

bergnerjonas avatar Apr 01 '24 10:04 bergnerjonas

Working on pandas.DataFrame.assign, pandas.DataFrame.bfill, and pandas.DataFrame.ffill

shriyakalakata avatar Apr 02 '24 03:04 shriyakalakata

Will work on pandas.DataFrame.get and pandas.DataFrame.dtypes

shriyakalakata avatar Apr 17 '24 18:04 shriyakalakata

Will work on pandas.DataFrame.copy, pandas.DataFrame.first_valid_index, pandas.DataFrame.last_valid_index, and pandas.DataFrame.keys

shriyakalakata avatar Apr 17 '24 20:04 shriyakalakata

Working on pandas.DataFrame.sparse, pandas.DataFrame.sparse.density, pandas.DataFrame.sparse.from_spmatrix, pandas.DataFrame.sparse.to_coo, and pandas.DataFrame.sparse.to_dense

gboeker avatar Apr 18 '24 16:04 gboeker

Working on:

pandas.DataFrame.columns
pandas.DataFrame.pop

KeiOshima avatar Apr 19 '24 05:04 KeiOshima

working on:

pandas.DataFrame.to_feather 

KeiOshima avatar Apr 21 '24 18:04 KeiOshima

working on

pandas.DataFrame.mean
pandas.DataFrame.median
pandas.DataFrame.plot
pandas.DataFrame.pop

gboeker avatar Apr 21 '24 19:04 gboeker

working on

pandas.DataFrame.__iter__
pandas.DataFrame.columns
pandas.DataFrame.droplevel

gboeker avatar Apr 21 '24 21:04 gboeker

working on

pandas.DataFrame.max
pandas.DataFrame.min

KeiOshima avatar Apr 23 '24 19:04 KeiOshima

Working on

-i "pandas.DataFrame.hist RT03" \
-i "pandas.DataFrame.infer_objects RT03" \
-i "pandas.DataFrame.reorder_levels SA01" \
-i "pandas.DataFrame.to_parquet RT03" \

shriyakalakata avatar Apr 25 '24 18:04 shriyakalakata

Awesome! Thanks for all the work on these, I've been a bit busy lately but I'll open some new issues for the remaining ones

jordan-d-murphy avatar Apr 25 '24 21:04 jordan-d-murphy

@jordan-d-murphy @mroeschke We need to reopen this issue. Following are still remaining

        -i "pandas.DataFrame.__dataframe__ SA01" \
        -i "pandas.DataFrame.at_time PR01" \
        -i "pandas.DataFrame.kurt RT03,SA01" \
        -i "pandas.DataFrame.kurtosis RT03,SA01" \
        -i "pandas.DataFrame.max RT03" \
        -i "pandas.DataFrame.mean RT03,SA01" \
        -i "pandas.DataFrame.median RT03,SA01" \
        -i "pandas.DataFrame.min RT03" \
        -i "pandas.DataFrame.plot PR02,SA01" \
        -i "pandas.DataFrame.prod RT03" \
        -i "pandas.DataFrame.product RT03" \
        -i "pandas.DataFrame.sem PR01,RT03,SA01" \
        -i "pandas.DataFrame.skew RT03,SA01" \
        -i "pandas.DataFrame.sparse PR01" \
        -i "pandas.DataFrame.std PR01,RT03,SA01" \
        -i "pandas.DataFrame.sum RT03" \
        -i "pandas.DataFrame.swaplevel SA01" \
        -i "pandas.DataFrame.to_markdown SA01" \
        -i "pandas.DataFrame.var PR01,RT03,SA01" \

tuhinsharma121 avatar Apr 30 '24 06:04 tuhinsharma121

I am working on

        -i "pandas.DataFrame.__dataframe__ SA01" \
        -i "pandas.DataFrame.at_time PR01" \
        -i "pandas.DataFrame.kurt RT03,SA01" \
        -i "pandas.DataFrame.kurtosis RT03,SA01" \

tuhinsharma121 avatar Apr 30 '24 06:04 tuhinsharma121

I am working on the following

        -i "pandas.DataFrame.prod RT03" \
        -i "pandas.DataFrame.product RT03" \
        -i "pandas.DataFrame.sem PR01,RT03,SA01" \
        -i "pandas.DataFrame.skew RT03,SA01" \
        -i "pandas.DataFrame.sparse PR01" \

tuhinsharma121 avatar Apr 30 '24 18:04 tuhinsharma121

working on

        -i "pandas.DataFrame.std PR01,RT03,SA01" \
        -i "pandas.DataFrame.sum RT03" \
        -i "pandas.DataFrame.swaplevel SA01" \
        -i "pandas.DataFrame.to_markdown SA01" \
        -i "pandas.DataFrame.var PR01,RT03,SA01" \

tuhinsharma121 avatar May 01 '24 13:05 tuhinsharma121

I will check out:

    -i "pandas.DataFrame.max RT03" \
    -i "pandas.DataFrame.mean RT03,SA01" \
    -i "pandas.DataFrame.median RT03,SA01" \
    -i "pandas.DataFrame.min RT03" \

BDixon808 avatar May 26 '24 00:05 BDixon808

I'll check out:

        -i "pandas.DataFrame.plot PR02,SA01" \

@mroeschke, @Aloqeely is there anything else I need to add?

anishfish2 avatar May 27 '24 07:05 anishfish2

Hey, is this issue open? would like contribute as a beginner. Thank you

enesyesil avatar Jun 21 '24 17:06 enesyesil

Yes it's still open. Good luck!

Aloqeely avatar Jun 21 '24 17:06 Aloqeely

Hey, I'd like to contribute to this issue if it's still open

shriyase avatar Jun 21 '24 19:06 shriyase

Yes! You can see ci/code_checks.sh for all the docstrings that need to be fixed.

Aloqeely avatar Jun 21 '24 20:06 Aloqeely

I will work on -i "pandas.DataFrame.value_counts RT03" \, -i "pandas.DataFrame.var PR01,RT03,SA01" \, -i "pandas.DataFrame.where RT03" \, -i "pandas.DataFrame.backfill PR01,SA01" \ and -i "pandas.DataFrame.bfill SA01" \

shriyase avatar Jun 21 '24 20:06 shriyase

Is it only that the methods from the original issue post need to be checked? Or any methods in ci/code_checks.sh? Also when i check a few of the methods, I get additional errors that aren't listed in the documentation? Do I add those error codes too?

shriyase avatar Jun 22 '24 01:06 shriyase

You can fix any method in that file.

Also when i check a few of the methods, I get additional errors that aren't listed in the documentation?

Not sure what you mean by additional errors.

Aloqeely avatar Jun 22 '24 08:06 Aloqeely

Does the validate_docstrings.py validation not work for anyone else? I'm getting an error on line 217, stating that there isn't enough values to unpack (expected 4 got 1) for all of the docstrings I've checked.

CollinClifford avatar Jun 23 '24 14:06 CollinClifford

It works for me. Can you post the command you ran and the error message?

Aloqeely avatar Jun 24 '24 21:06 Aloqeely