pandas
pandas copied to clipboard
DOC: Add docstrings for MultiIndex.levels and MultiIndex.codes
xref #55148
Seems like those docstrings are empty, we should create them.
See the attributes section here: https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.html
The docstring for `MultiIndex.levels should include information to make clear that levels are preserved even if the dataframe using the index doesn't contain all levels. See this page in the docs: https://pandas.pydata.org/docs/user_guide/advanced.html#defined-levels and this comment: https://github.com/pandas-dev/pandas/pull/55433#pullrequestreview-1663040010
This is the second time I've brought up pr in an open source project, so I misunderstood what you meant, and I'll finish the issue again.
It's normal, and the issue was difficult to follow, since it was a discussion, but I think the new issue explains better what needs to be done. If you have any question or you need help we are here to help. Thank you!
We've got now the docstring for MultiIndex.levels, but the one for MultiIndex.codes is still missing. Labelling this as good first issue in case anyone wants to help.
take
Hi mileslow do you still need time for this task, or do you mind if I work on it?
@Rollingterminator1 go for it.
Hi, is this issue already taken care of?
Hi, does this issue still need to be worked on?
1. Ensure Correct Data Types:
Make sure that your categorical columns are indeed of the "category" type. You can convert a column to a categorical type using astype:
df['categorical_column'] = df['categorical_column'].astype('category')
2. Check for Null Values:
Ensure that there are no null values in the categorical columns, as this can sometimes affect grouping.
df['categorical_column'].isnull().sum()
If there are null values, you might need to handle them appropriately before performing group operations.
3. Understand Grouping Requirements:
Make sure you understand the requirements of your grouping operation. For example, if you are trying to group by intervals, ensure that your categorical column is defined with the appropriate intervals.
pd.cut(df['numeric_column'], bins=[0, 10, 20, 30])
4. Use Groupby Correctly:
When using groupby, ensure you are providing the correct column name or a list of column names. For example:
grouped_data = df.groupby('categorical_column')['numeric_column'].sum()
Or, for multiple grouping columns:
grouped_data = df.groupby(['categorical_column1', 'categorical_column2'])['numeric_column'].sum()
5. Check Pandas Version:
Ensure that you are using a recent version of pandas. Bugs are often fixed in newer releases. You can check your pandas version with:
import pandas as pd
print(pd.__version__)
If you're using an older version, consider upgrading:
pip install --upgrade pandas
6. Minimal, Complete, and Verifiable Example:
If the issue persists, try to create a minimal, complete, and verifiable example that reproduces the problem. This makes it easier for others to help diagnose and fix the issue.
If you can provide more details or a sample of your code and data, I might be able to give more specific advice. Additionally, checking the pandas documentation or community forums can sometimes provide insights into common issues or bug reports.
class MultiIndex: """ A multi-level, or hierarchical, index object for pandas DataFrame.
...
Attributes
----------
levels : list
List of Index objects containing the unique values for each level of the MultiIndex.
codes : list
List of arrays containing the codes that indicate the position of each element in the levels.
...
Examples
--------
>>> arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]
>>> tuples = list(zip(*arrays))
>>> index = pd.MultiIndex.from_tuples(tuples, names=('first', 'second'))
>>> index
MultiIndex([('A', 1),
('A', 2),
('B', 1),
('B', 2)],
names=['first', 'second'])
>>> index.levels
[Index(['A', 'B'], dtype='object', name='first'),
Int64Index([1, 2], dtype='int64', name='second')]
>>> index.codes
[array([0, 0, 1, 1], dtype=int8),
array([0, 1, 0, 1], dtype=int8)]
"""
def __init__(self, levels, codes):
"""
Parameters
----------
levels : list
List of Index objects containing the unique values for each level of the MultiIndex.
codes : list
List of arrays containing the codes that indicate the position of each element in the levels.
"""
self.levels = levels
self.codes = codes
take
take
Hi, I would like to contribute.
Hi, looks like this has been inactive for a while so I'd like to try it
take
Ah it looks like there is already a docstring for MultiIndex.codes present in the main branch. Seems like this has already been fixed.
https://github.com/pandas-dev/pandas/blob/b1525c4a3788d161653b04a71a84e44847bedc1b/pandas/core/indexes/multi.py#L1080-L1102
take
Looks like #57601 fixed this - can we close this?
@datapythonista can we close this? Looks like was solved by #57601
is the issue still open ?
is the issue still open ?
The docstrings have been added, but there are many more issues labeled with 'Docs' that we would appreciate your help on