pandas_exercises icon indicating copy to clipboard operation
pandas_exercises copied to clipboard

More efficient alternative in 04_ApplyStudents_Alcohol_Consumption

Open rahimnathwani opened this issue 6 years ago • 2 comments

In step 10, we want to multiply all numerical values by 10.

The provided solution is: df.applymap(times10).head(10)

But this is very slow, because it runs a regular python function on every element in the dataframe.

Better is to test each column's type, and then use pandas built in multiplication on the whole column:

for colname, coltype in df.dtypes.to_dict().items():
    if coltype.name in ['int64']:
        df[colname] = df[colname] * 10

I used %%timeit to test the two solutions. On this small dataset, my solution is 5x as fast (1.1ms vs 5.8ms). The difference would get larger with a larger dataset.

rahimnathwani avatar Mar 06 '19 02:03 rahimnathwani

what if it's not an int64 though? This might work better.

newdf = df.select_dtypes(include=[np.number])
for column in newdf.columns:
    newdf[column] = newdf[column] * 10 

pcarlitz avatar Jun 11 '19 03:06 pcarlitz

@pcarlitz have you measured the performance? I am in favor of the fastest solution.

guipsamora avatar Oct 13 '19 14:10 guipsamora