datatable
datatable copied to clipboard
Improve datatable behavior when working with the keyed frames
Improving random attacker, it was found that we need to adjust datatable behavior in terms of the operations on keyed frames. Since keying a frame is an expensive operation, as it requires sorting one or several columns, we need to preserve keys as much as possible.
In particular, key columns can be easily kept after the following operations:
- [ ] selecting a subset containing a strictly increasing sequence of rows:
- [ ] a slice with positive step;
- [ ] an expression or a boolean column filter;
- [ ] a list or an integer rows subset, provided that the values are in increasing order.
- [x] deleting rows from a frame;
- [x] reducing number of rows on a frame;
- [x] deleting columns from a frame, except when a column is a part of a multi-column key;
- [ ] selecting a column subset that contains all the keyed columns, provided they are selected first in the list and in the same order as in the source frame;
- [x] selecting a column through the single column selector;
- [ ] joining two frames if the lhs frame is keyed (natural, left outer or inner joins only);
- [ ] sorting, only in the case when it is done over all the key columns asc, in such a case sorting becomes trivial.
- [ ] in addition, grouping operation may automatically create the key, provided that all columns computed in the expression are reducers.
The following operations on keyed frames should throw an exception:
- [x] rbinding it to anything besides the zero-row frame;
- [x] deleting columns that are a part of a multi-column key, unless frame contains zero rows or all key columns are removed at once;
- [x] replacing values in a key column;
- [x] assigning values in a key column;
- [x] increasing number of rows;
- [x] deleting a rectangular subset of values, as this effectively replaces deleted values with NAs.