Jonas Haag

Results 800 comments of Jonas Haag

It’s part of a deep learning data augmentation pipeline

Yeah SciPy has a lot of stuff, but for example I wanted to quickly test some low/high shelf filtering and SciPy doesn't have it -- or at least I can't...

I still think Yodel is incredibly valuable, even if you understandably don't plan to work on it any further, simply for its simplicity, ease of use, and educational value.

STATUS: This is done for lightgbm (#15), and for sklearn we're not doing it (#19) We could try Parquet for storing the arrays. It has great support for sparse arrays...

STATUS: We don't need this for lightgbm since it uses Parquet, and the sklearn code currently has no boolean arrays. We can use NumPy‘s `pack` functionality to represent boolean arrays...

In a real-world model I just benchmarked we have ALL `children_left` like this: ``` [1, 2, 3, ..., 42, -1, -1, ..., N] ``` ie. it is equivalent to `range(1,...

STATUS: Parquet seems to handle this just fine, not sure about lzma We found in the lgbm data a lot of values like `1e-35`. Are they NaN? If so we...

Combine sklearn trees into a single array to profit from potentially better Parquet compression. Eg. if your random forest has 100 trees, concat each of the 100 tree arrays, like...

Use [Pseudodecimal Encoding from btrblocks](https://github.com/maxi-k/btrblocks)

Fun fact, float printing performance seems to scale with the number of digits ```py l = [i/17. for i in range(1_000)] In [2]: %timeit ", ".join(str(x) for x in l)...