Tables.jl icon indicating copy to clipboard operation
Tables.jl copied to clipboard

Better support for in-place operations on tables

Open rofinn opened this issue 6 years ago • 12 comments

Specifically, it'd be nice if I could use some traits to determine whether I can mutate the underlying data during row or column iteration (e.g., mutating values in DataFrameRow).

rofinn avatar Aug 01 '19 19:08 rofinn

Ok, I've been noodling on this for.......6 days (haha, actually longer, because people have brought it up on slack and stuff). @rofinn can you talk a little more about the use-case you have in mind for this? I have some ideas, but most of mine end in "oh, this actually wouldn't be useful for the most part", but I want to hear a solid case where someone wants to use it and how it would be helpful. Anyway, I can try to put some of my thoughts together, but in the mean time, I thought I'd ask for some more info from your side.

quinnj avatar Aug 07 '19 21:08 quinnj

My use case is in Impute.jl where I'm trying to mutate data in-place if possible by applying some operation over each column.

function impute!(table, imp::Imputor)
    istable(table) || throw(MethodError(impute!, (table, imp)))

    # Extract a columns iterator that we should be able to use to mutate the data.
    # NOTE: Mutation is not guaranteed for all table types, but it avoids copying the data
    columntable = Tables.columns(table)

    for cname in propertynames(columntable)
        impute!(getproperty(columntable, cname), imp)
    end

    return table
end

https://github.com/invenia/Impute.jl/blob/master/src/imputors.jl#L155

In this code, the passed in table will only sometimes mutate the data depending on table type passed in. It'd be nice if I could check that calling Tables.columns will allow me to mutate the underlying data and throw a warning if it doesn't.

rofinn avatar Aug 07 '19 22:08 rofinn

Hi,

for my usecase -- trying to get Selections.jl easily available to the ecosystem -- I'd like to be have select() and select!() functions, both could de-select columns and for mutable datasources, I'd like to provide the inplace variant for efficiency. This would require some way of signaling mutability (Tables.ismutable?) and providing a way of deletion of columns (Tables.deleteat!) as well as their reordering (like permutecols!). What @rofinn describes also seems useful to me.

Drvi avatar Sep 08 '19 22:09 Drvi

With https://github.com/JuliaData/Tables.jl/pull/131, we're committing to enhancing the Tables.jl interface a bit, but also trying to keep it very minimal, to encourage adoption. As I've thought of this and a few other related issues, I think it would make sense to have a MutableTables.jl package (or maybe called InMemoryTables.jl). It turns out there are a lot of things like this that people want to do, but that really apply to a stricter subset of "table types" that allow mutation and can be manipulated (or indexed, or sorted, etc.). So in my mind, it's possible we could define something in Tables.jl, but it feels a bit off because Tables.jl is trying to be so generic (though admittedly not as generic as TableTraits.jl). That's why I think it'd be useful to have a separate package that could use Tables.jl, but also define additional interface requirements for various table manipulations. Thoughts @Drvi , @bkamins , @nalimilan , @davidanthoff , @rofinn , @iamed2 , @andyferris ?

quinnj avatar Feb 08 '20 04:02 quinnj

I think that there are three levels of this mutability, and we should be explicit at which level we target:

  1. allowing to change some values in the table without resizing it (setindex!, sort!, ...)
  2. allowing to change number of rows (but keeping schema fixed)
  3. allowing to change number of columns, names of columns, eltype of columns

bkamins avatar Feb 08 '20 07:02 bkamins

I think a separate package, which enhances the interface, would be the best approach for now, since it would allow experimentation with the mutable interface without affecting the API defined in this package (cf #133).

tpapp avatar Feb 08 '20 07:02 tpapp

@bkamins I would be tempted to try make these three seperate/orthogonal interfaces for perming different mutations, rather than “levels” or layers with some on top of the others.

E.g. I’m imagining you could have 3 without 2 (data frame of static arrays) or 2 without 1 (functional programmers like to think of “append only” databases).

andyferris avatar Feb 08 '20 10:02 andyferris

Sure - they are largely orthogonal. I have this order in the back of my head, as it is natural in DataFrames.jl, but for other data structures clearly it is the way you say 😄.

A particular cases is that 3 assumes allowing "replacing" of the column it is not the same as 1, which mostly assumes updating column in-place (however, for some data structures 1 would imply replacement - when in order to setindex! you would have to replace a column because it is immutable, but 1 would guarantee that eltype after replacement does not change).

bkamins avatar Feb 08 '20 10:02 bkamins

Yes it’s very interesting how mutating a column behaves somewhat the same as mutating the rows. Of course, you can tell the difference when you have access to the column references.

The way I always imagined this playing out is (a) have two APIs/traits for mutation and insertion into data structures (and “upsert” for data structures that support both, this is the way it is done in Dictionaries.jl), and (b) have table modelled as a nested data structure (a relation is a collection of rows). All the different cases you mention simply fall out naturally.

andyferris avatar Feb 08 '20 10:02 andyferris

On a related note, should Tables.jl have a similar fallback like the arrays interface where folks can be guaranteed to be returned a mutable table? That would simplify the code posted above at the cost of potentially inconsistent return types.

rofinn avatar Feb 26 '20 18:02 rofinn

I wonder if there was any progress regarding experimentation of a trait system for mutable tables? At this point in time Tables.jl is the defacto standard for tables in Julia, and we are reaching applications where mutability and a basic setindex! would be great.

juliohm avatar Jul 30 '23 22:07 juliohm