mito icon indicating copy to clipboard operation
mito copied to clipboard

Error in `Dataframe` has no attribute dtype

Open naterush opened this issue 2 years ago • 5 comments

Describe the bug

We've seen the bug that df has not attribute dtype (in the code that writes the dataframe to json), even though it is unclear what sort of dataframe would cause this error.

We need to find a dataframe that replicates this to reproduce it, so that we can fix it.

naterush avatar May 02 '22 20:05 naterush

This dataframe triggers it: pd.DataFrame({1: [2, 3], True: [False, True]}, index=[1, 2])

I think it's when there is 1 and True. The indexing in results in the same column, but for some reason if you generate this (e.g. through a promote row to header), then you get this error!

Copying full test case here:

#!/usr/bin/env python
# coding: utf-8

# Copyright (c) Saga Inc.
# Distributed under the terms of the GPL License.
"""
Contains tests for Promote Row To Header
"""

import pandas as pd
import pytest
from mitosheet.tests.test_utils import create_mito_wrapper_dfs

PROMOTE_ROW_TO_HEADER_TESTS = [
    (
        [
            pd.DataFrame({'A': [1, 2, 3]})
        ],
        0, 
        0, 
        [
            pd.DataFrame({1: [2, 3]}, index=[1, 2])
        ]
    ),
    (
        [
            pd.DataFrame({'A': [1, 2, 3]})
        ],
        0, 
        1, 
        [
            pd.DataFrame({2: [1, 3]}, index=[0, 2])
        ]
    ),
    (
        [
            pd.DataFrame({'A': [1, 2, 3], 'B': ["A", "B", "C"]})
        ],
        0, 
        0, 
        [
            pd.DataFrame({1: [2, 3], 'A': ["B", "C"]}, index=[1, 2])
        ]
    ),
    (
        [
            pd.DataFrame({'A': [1, 2, 3], 'B': [True, False, True]})
        ],
        0, 
        0, 
        [
            pd.DataFrame({1: [2, 3], True: [False, True]}, index=[1, 2])
        ]
    ),
]
@pytest.mark.parametrize("input_dfs, sheet_index, row_index, output_dfs", PROMOTE_ROW_TO_HEADER_TESTS)
def test_fill_na(input_dfs, sheet_index, row_index, output_dfs):
    mito = create_mito_wrapper_dfs(*input_dfs)

    mito.promote_row_to_header(sheet_index, row_index)

    assert len(mito.dfs) == len(output_dfs)
    for actual, expected in zip(mito.dfs, output_dfs):
        assert actual.equals(expected)

naterush avatar May 25 '22 04:05 naterush

This also occurs if you try to promote a row to a header, and that row has two of the same values in it.

For example, take this dataframe df = pd.DataFrame({'A': [1,2], 'B': [1,2]}) and try to promote the first row to the header. It will error.

Then update A1 -> 0, and try to promote it again. This time it will work.

aarondr77 avatar Jun 06 '22 18:06 aarondr77

Nice! This is confirmed the bug. Nice work investigating to get there!

naterush avatar Jun 06 '22 18:06 naterush

Hmmm... actually, there is still some bug on commit be9debb87d4e3e0e21ae9bbcc793331ac7bfdb24 Here is the dataset: indicators.csv

https://user-images.githubusercontent.com/18709905/172882457-f1b45db4-b3d8-4024-9fec-890f0000dcfd.mov

aarondr77 avatar Jun 09 '22 15:06 aarondr77

Note: The above video might not actually be an example of our misunderstanding of the AttributeError: DataFrame objecg has not attribute dtype error. It might have been the result of a bug in the promote row to header implementation that was fixed 37fc712c1450d2d7e488ff32ad61db7b4eaceeeb.

aarondr77 avatar Jun 18 '22 00:06 aarondr77