matminer
matminer copied to clipboard
load_dataframe_from_json fails on MultiIndex
I just noticed there's a problem with load_dataframe_from_json
when trying to load multi-index dataframes.
from matminer.utils.io import load_dataframe_from_json, store_dataframe_as_json
import numpy as np
import pandas as pd
arr = np.arange(20).reshape(5, 4)
df = pd.DataFrame(arr, columns=list("abcd"))
store_dataframe_as_json(df, "df.json")
df = load_dataframe_from_json("df.json")
# all good here
df = pd.DataFrame(arr, columns=list("abcd")).set_index(["a", "b"])
store_dataframe_as_json(df, "df.json")
df = load_dataframe_from_json("df.json")
>>> ValueError: Shape of passed values is (5, 2), indices imply (2, 2)
That's because pandas doesn't support passing in a list of lists as a multi-index. Instead you have to create a MultiIndex
object first and pass that in
idx = [[i, i + 1] for i in range(5)]
pd.DataFrame(arr, columns=list("abcd"), index=idx)
>>> ValueError: Shape of passed values is (5, 4), indices imply (2, 4)
idx = pd.MultiIndex.from_tuples(((i, i + 1) for i in range(5)))
pd.DataFrame(arr, columns=list("abcd"), index=idx)
So one possible fix would be
if isinstance(dataframe_data, dict):
if set(dataframe_data.keys()) == {"data", "columns", "index"}:
+ if type(dataframe_data['index'][0]) == list:
+ dataframe_data['index'] = pandas.MultiIndex.from_tuples(dataframe_data['index'])
return pandas.DataFrame(**dataframe_data)