ModelParameters.jl icon indicating copy to clipboard operation
ModelParameters.jl copied to clipboard

Improve indexing to support table behavior

Open bgroenks96 opened this issue 3 years ago • 26 comments

This is a major overhaul of the model interface to better support Table-like indexing behavior, most importantly through setindex.

The basic idea is that AbstractModels should allow Cartesian indexing, i.e. m[i,j] for both setindex! as well as getindex in order to really act like a table.

This PR refactors the existing implementations of getindex and setindex! to accomplish this, and in the process, expands the functionality of setindex! (and Base.setindex for the immutable case) to permit row indices.

The implementations of update/update! are simplified to be simple wrappers around setindex. In addition, update now provides an additional dispatch which permits the application of filtering rules to generate row indices via do syntax:

updated_m = update(m, p -> p.somefield == somevalue) do p
    # generate values here;
    # returning a scalar will apply the new values to the :val field;
    # returning a row vector, NamedTuple, or any Tables.jl compatible type will update all columns in the table.
    p.val * 2.0
end

This allows the user to apply filters to only update subsets of parameters. This is also possible via setindex! using row indices:

mask = map(p -> p.somefield == somevalue, params(m))
m[mask, :val] = x # x must match in size

~This is a draft PR at the moment. More clean up needs to be done and docstrings added at the minimum. There are also some behavioral issues that need to be addressed, like how to handle broadcasting.~

Current limitations:

  • Multi-column indexing for getindex is not supported, since it's not clear what the return value should be (named tuples?). It is supported by setindex, however.
  • getindex/setindex on subsets of rows are not in general type stable or allocation free. I think this is acceptable for now given the likely use cases.

Resolves #41 Resolves #21

bgroenks96 avatar Feb 04 '22 17:02 bgroenks96

Codecov Report

Merging #44 (561654d) into master (9af3c3a) will increase coverage by 3.02%. The diff coverage is 86.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #44      +/-   ##
==========================================
+ Coverage   79.88%   82.91%   +3.02%     
==========================================
  Files           4        4              
  Lines         174      199      +25     
==========================================
+ Hits          139      165      +26     
+ Misses         35       34       -1     
Impacted Files Coverage Δ
src/tables.jl 77.77% <0.00%> (ø)
src/param.jl 85.00% <50.00%> (-1.85%) :arrow_down:
src/model.jl 82.55% <87.87%> (+4.77%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9af3c3a...561654d. Read the comment docs.

codecov-commenter avatar Feb 04 '22 17:02 codecov-commenter

@rafaqz This is ready for feedback.

bgroenks96 avatar Feb 06 '22 21:02 bgroenks96

The allocation test that is failing on CI passes on my system (Julia 1.6), so it must be a compiler version issue.

bgroenks96 avatar Feb 06 '22 21:02 bgroenks96

@rafaqz Do you have any thoughts on what to do about this version-dependent test failure?

bgroenks96 avatar Feb 07 '22 21:02 bgroenks96

So its allocating on 1.3 but not 1.6?

You can put that test in a version specific conditional, or we can stop supporting 1.3

rafaqz avatar Feb 08 '22 10:02 rafaqz

I like the version-specific conditional idea. Let's see if it works now.

bgroenks96 avatar Feb 08 '22 10:02 bgroenks96

Also, since this changes indexing behavior (although should be non-breaking for most things...?), we should probably bump the minor version.

bgroenks96 avatar Feb 08 '22 10:02 bgroenks96

Finally all green! Let me know when you've looked at it, @rafaqz . I'm not 100% happy with the way it turned out... I think the dispatch graph for setindex/getindex could probably be a bit cleaner. But it's probably OK for now so long as it works.

bgroenks96 avatar Feb 08 '22 12:02 bgroenks96

Ok I'll try to have a look tonight.

rafaqz avatar Feb 08 '22 13:02 rafaqz

@rafaqz Do you have any other blocking concerns or can we go ahead and merge this? We could make an issue about potential performance concerns.

bgroenks96 avatar Feb 10 '22 17:02 bgroenks96

Normally we would resolve those two open questions first? One is performance and the other is syntax, which would mean a second breaking change if we change that later.

rafaqz avatar Feb 10 '22 18:02 rafaqz

At the moment, there are no breaking changes. The syntax question is really whether or not it's worth it to make such a breaking change in order to be consistent with other table interfaces. The performance issue is low priority, at least for my use case.

bgroenks96 avatar Feb 10 '22 19:02 bgroenks96

The performance issue will take 5 minutes to fix...

But the general idea is I wouldn't ask that question in a review if I didn't want to discuss it before merging.

rafaqz avatar Feb 10 '22 20:02 rafaqz

It doesn't look that easy to me, but maybe I am missing something. In any case, if you already know how to improve it, please feel free to suggest a solution.

bgroenks96 avatar Feb 10 '22 20:02 bgroenks96

Just use map instead of a loop and the mixed type tuple should be type stable

rafaqz avatar Feb 10 '22 20:02 rafaqz

No. There is a loop there for a reason. As I pointed out in the other comment, it iteratively reconstructs m for each column. It's not a map-like operation. There is almost certainly a better way to do it, but I am quite sure it's not that simple. To use map, you would need to come up with a way to merge all of the reconstructed Models (for each column).

bgroenks96 avatar Feb 10 '22 20:02 bgroenks96

That's a fold.... foldl should also be type stable - if you use StaticModel

rafaqz avatar Feb 10 '22 21:02 rafaqz

Ok, I'll try foldl.

bgroenks96 avatar Feb 10 '22 21:02 bgroenks96

It makes it type stable but does not have any substantial impact on performance, at least on this test case that I tried:

m = Model(s1)
xs = Tables.columns([(val=5.0,bounds=nothing),(val=4.0,bounds=nothing)])
idxs = [1,2]
cols = (:val,:bounds)
@btime Base.setindex($m, $xs, $idxs, $cols)
# before: 26.140 μs (404 allocations: 23.39 KiB)
@btime Base.setindex($m, $xs, $idxs, $cols)
# after: 25.525 μs (404 allocations: 23.39 KiB)

Using StaticModel shaves off a few allocations but still doesn't make a huge difference:

24.250 μs (395 allocations: 22.61 KiB)

bgroenks96 avatar Feb 10 '22 21:02 bgroenks96

It's also worth noting that indexing tables with Tuple is not standard practice. I don't think Matrix or DataFrames support it.

bgroenks96 avatar Feb 10 '22 21:02 bgroenks96

Ah right, maybe we should remove it instead.

rafaqz avatar Feb 11 '22 16:02 rafaqz

Also how is the ttfx for this stuff? Seems that type stability and allocations from it might also take a while to compile.

rafaqz avatar Feb 11 '22 16:02 rafaqz

I just started using this branch in my "real world" use case and it's.... not good. Like almost a full minute to compile on a deeply nested model with about 50 parameters. This is with several calls to setindex/update.

bgroenks96 avatar Feb 11 '22 17:02 bgroenks96

Yeah... the type stability is probably more important for that than for runtime speed. Have a look with @snoopi_deep and Cthulhu.jl, there might be some easy wins.

rafaqz avatar Feb 11 '22 17:02 rafaqz

It's probably going to be reconstruct :(

bgroenks96 avatar Feb 11 '22 17:02 bgroenks96

I guess we should come back to this at some point.... I am no longer convinced that there is a strong enough use case for type-stable table updates that is worth the insanely high compilation cost. Maybe we should just accept type instability here?

bgroenks96 avatar Aug 17 '22 14:08 bgroenks96