Wes McKinney comments

Results 203 comments of


                                            Wes McKinney

Code to replicate performance metrics

+1, the more easily reproducible the performance numbers the better (e.g. providing a Dockerfile would be ideal). This way users can validate performance on various hardware configurations I also recommend...

Grizzly is Python 2.7 only

While you're at it, it would be nice to plot a course to `conda install weld` and get all the Python things in a single `import weld` statement. This probably...

Grizzly is Python 2.7 only

You can look at what we did in Apache Arrow with manylinux1: https://github.com/wesm/arrow/blob/master/python/manylinux1/build_arrow.sh and https://github.com/wesm/arrow/blob/master/python/setup.py#L210 so all the shared libs (build with CMake) get bundled in the wheel. Probably possible...

Grizzly is Python 2.7 only

conda is the easiest way since you can package `libweld` (the shared libraries) and `weld-python` (the Python package and C extensions) as separate components

Grizzly is Python 2.7 only

There seems to be some GitHub snafu right now so all the Apache git mirrors on GitHub are down at the moment

Evaluating Weld functions on Arrow memory without serialization

I'm very interested in the subgraph compiler problem for Arrow. It might be that we need to define a slightly higher level Arrow analytics IR that lowers to Weld DSL...

Memory layouts for string processing

The notion of "arbitrary input data formats" is potentially a rathole. Beyond non-nullable tensor-like memory (i.e. the NumPy ndarray model), packed record / row-oriented tables (similar to Spark's Tungsten "off...

Is this project advancing?

@sursu indeed one of my primary motivations in developing the Apache Arrow project (which has more or less been my primary focus since sometime in 2015) is to develop next-generation...

DESIGN: Cheaper DataFrame.append

We could definitely have a mutating append and write into resizeable buffers (with growth factor 1.5 or 2). Something we can experiment with

Alternate groupby API that is more functionally consistent with databases or systems like dplyr

definitely, re: #7