sdc icon indicating copy to clipboard operation
sdc copied to clipboard

HPAT unable to handle non-existent dataframe index name.

Open triskadecaepyon opened this issue 5 years ago • 3 comments

HPAT crashes if it gets a Pandas dataframe with non-existent dataframe index. Many users create ones without such indexes (as is shown in Pandas docs), so it would be useful to fix this bug.

@akharche perhaps the changes to indexing made this behavior happen?

import pandas as pd
import numpy as np
import hpat

df = pd.DataFrame({'0':[100,200,300,400,200,100]})
df2 = pd.DataFrame([100,200,300,400,200,100])
df3 = pd.DataFrame({'A':[100,200,300,400,200,100]})

@hpat.jit
def test_func(data_frame):
    return data_frame

test_func(df) # works with warnings
test_func(df2) # fails and crashes kernel
test_func(df3) # works with warnings

triskadecaepyon avatar Sep 03 '19 17:09 triskadecaepyon

Thank you for the example. I guess this issue is not connected to non-existent dataframe index (I mean the additional column "Index"). Now Hpat is limited in handling DataFrames created such way df2 = pd.DataFrame([100,200,300,400,200,100]), Hpat expects column name like df2 = pd.DataFrame([100,200,300,400,200,100], columns=['A']) or a dictionary like in your examples. It is mentioned in Hpat docs.

We will add ability to handle such cases.

akharche avatar Sep 05 '19 10:09 akharche

So an additional thought then since we are adding this ability:

Is there a way we can elegantly handle unsupported features in the future? As in throw a warning, don't continue, leave as Python Object code?

triskadecaepyon avatar Sep 06 '19 09:09 triskadecaepyon

It is of course possible to just not compile anything and leave the entire function uncompiled. It's less straight forward to partially fall back to object mode. It is conceptually possible for certain functions, e.g. those which are trivially data parallel. There is no default fallback for others, like reductions, joins etc. The situation is different if we separated distribution from compilation: going to object-mode in (a hypothetical) non-distributed mode should always be possible.

fschlimb avatar Sep 06 '19 09:09 fschlimb