Trill Columnar data format effiency: create extra columns for needed expressions

Columnar data format effiency: create extra columns for needed expressions

Open cybertyche opened this issue 6 years ago • 2 comments

For a data structure with fields a, b, and c, if the downstream query operators never refer to field a directly but instead refer to a.d.z, or a["bacon"], or some other constant expression, it may make sense to have a column representing a.d.z or a["bacon"] instead of a. This change would require an alteration of the data type structure of the generated columnar batch, and it would change the way that generated operators over those columns reference fields.

Dec 10 '18 00:12 cybertyche

There are multiple discussions around this topic, I think. I link the other places at https://github.com/dotnet/corefx/issues/26845 and https://github.com/dotnet/machinelearning/issues/69. It appears the handling industry is converging around Apache Arrow (https://arrow.apache.org/) as the columnar format and it landed an initial C# implementation just recently (https://github.com/apache/arrow/tree/master/csharp). It might make sense to coordinate a bit around this a bit to make a good case for .NET at large (as a side note, tangentially discussed heterogenous computing, Arrow, machine learning parameter tunings and other things at https://github.com/dotnet/orleans). :)

For the readers coming from other links, the Trill has a few other related issues: https://github.com/Microsoft/Trill/issues/7 https://github.com/Microsoft/Trill/issues/6

Dec 29 '18 14:12 veikkoeeva

That's a fantastic idea. If there is already data arriving natively in Arrow format then making Trill operate directly and efficiently on it would be fantastic.

Dec 30 '18 21:12 cybertyche

Trill Trill copied to clipboard

Columnar data format effiency: create extra columns for needed expressions

Trill
Trill copied to clipboard