framework icon indicating copy to clipboard operation
framework copied to clipboard

`to_pandas()` in Excel resource returning an empty dataframe

Open gabrielbdornas opened this issue 2 years ago • 0 comments

Overview

I'm trying to use the to_pandas() to create a pandas DataFrame in one resource that is an Excel file.

from frictionless import Package

package = Package("datapackage.yaml")
mega = package.get_resource("mega-sena")
df = mega.to_pandas()

print(df)

Unfortunately, when the script is called, it returns an empty DataFrame:

Empty DataFrame
Columns: [Concurso, Data do Sorteio, Bola1, Bola2, Bola3, Bola4, Bola5, Bola6, Ganhadores 6 acertos, Cidade / UF, Rateio 6 acertos, Ganhadores 5 acertos, Rateio 5 acertos, Ganhadores 4 acertos, Rateio 4 acertos, Acumulado 6 acertos, Arrecadação Total, Estimativa prêmio, Acumulado Sorteio Especial Mega da Virada, Observação]
Index: []

The sample of the error could be found in this reprex commit[^1].

Creating a DataFrame from an .csv (with the same Excel content) works:

# Create csv file with excel content
from frictionless import Package
import pandas as pd

df = pd.read_excel("download/Mega-Sena.xlsx", index_col=0)
df.to_csv("data/Mega-Sena.csv")

# Create DataFrame from csv file
from frictionless import Package

package = Package("datapackage_csv.yaml")
mega = package.get_resource("mega-sena")
df = mega.to_pandas()

print(df)

The result:

      Concurso Data do Sorteio  ...  Acumulado Sorteio Especial Mega da Virada  Observação
0            1      11/03/1996  ...                                     R$0,00        None
1            2      18/03/1996  ...                                     R$0,00        None
2            3      25/03/1996  ...                                     R$0,00        None
3            4      01/04/1996  ...                                     R$0,00        None
4            5      08/04/1996  ...                                     R$0,00        None
...        ...             ...  ...                                        ...         ...
2611      2612      19/07/2023  ...                            R$61.356.654,15        None
2612      2613      22/07/2023  ...                            R$62.837.684,98        None
2613      2614      25/07/2023  ...                            R$64.067.866,59        None
2614      2615      27/07/2023  ...                            R$64.873.493,13        None
2615      2616      29/07/2023  ...                            R$65.936.608,29        None

The sample of the workaround could be found in this reprex commit.

[^1]: To run it just install packages listed in requirements.txt file and run python scripts/pandas_excel.py

gabrielbdornas avatar Aug 04 '23 12:08 gabrielbdornas