sdc
sdc copied to clipboard
HPAT unable to handle non-existent dataframe index name.
HPAT crashes if it gets a Pandas dataframe with non-existent dataframe index. Many users create ones without such indexes (as is shown in Pandas docs), so it would be useful to fix this bug.
@akharche perhaps the changes to indexing made this behavior happen?
import pandas as pd
import numpy as np
import hpat
df = pd.DataFrame({'0':[100,200,300,400,200,100]})
df2 = pd.DataFrame([100,200,300,400,200,100])
df3 = pd.DataFrame({'A':[100,200,300,400,200,100]})
@hpat.jit
def test_func(data_frame):
return data_frame
test_func(df) # works with warnings
test_func(df2) # fails and crashes kernel
test_func(df3) # works with warnings
Thank you for the example. I guess this issue is not connected to non-existent dataframe index (I mean the additional column "Index").
Now Hpat is limited in handling DataFrames created such way df2 = pd.DataFrame([100,200,300,400,200,100])
, Hpat expects column name like df2 = pd.DataFrame([100,200,300,400,200,100], columns=['A'])
or a dictionary like in your examples.
It is mentioned in Hpat docs.
We will add ability to handle such cases.
So an additional thought then since we are adding this ability:
Is there a way we can elegantly handle unsupported features in the future? As in throw a warning, don't continue, leave as Python Object code?
It is of course possible to just not compile anything and leave the entire function uncompiled. It's less straight forward to partially fall back to object mode. It is conceptually possible for certain functions, e.g. those which are trivially data parallel. There is no default fallback for others, like reductions, joins etc. The situation is different if we separated distribution from compilation: going to object-mode in (a hypothetical) non-distributed mode should always be possible.