Jonas Haag comments

Results 800 comments of


                                            Jonas Haag

Numba compilation

It’s part of a deep learning data augmentation pipeline

Numba compilation

Yeah SciPy has a lot of stuff, but for example I wanted to quickly test some low/high shelf filtering and SciPy doesn't have it -- or at least I can't...

Numba compilation

I still think Yodel is incredibly valuable, even if you understandably don't plan to work on it any further, simply for its simplicity, ease of use, and educational value.

Jonas‘ optimization ideas

STATUS: This is done for lightgbm (#15), and for sklearn we're not doing it (#19) We could try Parquet for storing the arrays. It has great support for sparse arrays...

Jonas‘ optimization ideas

STATUS: We don't need this for lightgbm since it uses Parquet, and the sklearn code currently has no boolean arrays. We can use NumPy‘s `pack` functionality to represent boolean arrays...

Jonas‘ optimization ideas

In a real-world model I just benchmarked we have ALL `children_left` like this: ``` [1, 2, 3, ..., 42, -1, -1, ..., N] ``` ie. it is equivalent to `range(1,...

Jonas‘ optimization ideas

STATUS: Parquet seems to handle this just fine, not sure about lzma We found in the lgbm data a lot of values like `1e-35`. Are they NaN? If so we...

Jonas‘ optimization ideas

Combine sklearn trees into a single array to profit from potentially better Parquet compression. Eg. if your random forest has 100 trees, concat each of the 100 tree arrays, like...

Jonas‘ optimization ideas

Use [Pseudodecimal Encoding from btrblocks](https://github.com/maxi-k/btrblocks)

parquet-compression for lgbm

Fun fact, float printing performance seems to scale with the number of digits ```py l = [i/17. for i in range(1_000)] In [2]: %timeit ", ".join(str(x) for x in l)...