ballerina-spec
ballerina-spec copied to clipboard
Support column major order for tables values
By Supporting the column-major table we can improve performance of the certain operations and the new table can be used as data structure that is similar to that of Pandas DataFrames.
Ballerina tables are currently designed to be in row-major order, but there is an opportunity to support column-major order using tuples. Currently, the row-type-parameter
in the table type descriptor must be a sub-type of map<any|error>
. However, if we modify this to be a [any|error...]
, we can consider it as a column-major table. However, this affects several areas of the current table design such as key
and langlib etc.
Another option is to use an additional syntax modifier (a keyword) or metadata such as annotation to indicate that the table is a column-major table. So in runtime, we can model the table as a tuple of lists, instead of an array of records.
Yeah I don't think we need to change the design dramatically to do this.
If we can add an annotation or a keyword to change storage model, and we add a few langlib functions to get/add/delete columns that will do IMO. Of course the column adding / deleting can't break the type .. so that's only possible for fields that are not mandatory in the record type. If the record is closed then you cannot add/delete columns.
On Fri, Mar 3, 2023 at 1:35 PM Hasitha Aravinda @.***> wrote:
By Supporting the column-major table we can improve performance of the certain operations and the new table can be used as data structure that is similar to that of Pandas DataFrames.
Ballerina tables are currently designed to be in row-major order, but there is an opportunity to support column-major order using tuples. Currently, the row-type-parameter in the table type descriptor must be a sub-type of map<any|error>. However, if we modify this to be a [any|error...], we can consider it as a column-major table. However, this affects several areas of the current table design such as key and langlib etc.
Another option is to use an additional syntax modifier (a keyword) or metadata such as annotation to indicate that the table is a column-major table. So in runtime, we can model the table as a tuple of lists, instead of an array of records.
— Reply to this email directly, view it on GitHub https://github.com/ballerina-platform/ballerina-spec/issues/1223, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR7FPHUPZ6TWFIO5ARAUNLW2GQ35ANCNFSM6AAAAAAVOJLLAE . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Sanjiva Weerawarana
It's fundamental that a Ballerina table is a 1-dimensional container of records. Furthermore, the record you get out is ===
to the record you put in (as it would be with a list of records).
Suppose we have a table type table{record {| readonly int x; string y; |} key(x)
. If we want to construct this from columns (and store it as columns), we can do so by using a function that takes a value of type record {| int[] x; string[] y; |}
. When the table is mutable it would need to copy these arrays, so for efficiency, it should also be possible to use a record of streams. Conceptually this would create a record for each row, but the implementation can avoid materializing them unless and until required. For example, if it queries using foreach var {x, y} in t
, then it does not need to create the row records.
The challenge is finding a way to write the type of the function. I think something like this can be made to work:
@typeParam{ transform: "memberList" fromType: MapType }
type MapListType map<(any|error)[]>
public function fromColumnLists(MapListType columnLists, typedesc<table<MapType>> t = <>) return t|error = external;
This adds a field to the typeParam annotation. It's saying that MapListType is constructed from MapType using the "memberList" transform, which would be defined to mean to transform each member type T
into T[]
.