ModelParameters.jl
ModelParameters.jl copied to clipboard
Improve indexing to support table behavior
This is a major overhaul of the model interface to better support Table-like indexing behavior, most importantly through setindex
.
The basic idea is that AbstractModel
s should allow Cartesian indexing, i.e. m[i,j]
for both setindex!
as well as getindex
in order to really act like a table.
This PR refactors the existing implementations of getindex
and setindex!
to accomplish this, and in the process, expands the functionality of setindex!
(and Base.setindex
for the immutable case) to permit row indices.
The implementations of update
/update!
are simplified to be simple wrappers around setindex
. In addition, update
now provides an additional dispatch which permits the application of filtering rules to generate row indices via do
syntax:
updated_m = update(m, p -> p.somefield == somevalue) do p
# generate values here;
# returning a scalar will apply the new values to the :val field;
# returning a row vector, NamedTuple, or any Tables.jl compatible type will update all columns in the table.
p.val * 2.0
end
This allows the user to apply filters to only update subsets of parameters. This is also possible via setindex!
using row indices:
mask = map(p -> p.somefield == somevalue, params(m))
m[mask, :val] = x # x must match in size
~This is a draft PR at the moment. More clean up needs to be done and docstrings added at the minimum. There are also some behavioral issues that need to be addressed, like how to handle broadcasting.~
Current limitations:
- Multi-column indexing for
getindex
is not supported, since it's not clear what the return value should be (named tuples?). It is supported bysetindex
, however. -
getindex
/setindex
on subsets of rows are not in general type stable or allocation free. I think this is acceptable for now given the likely use cases.
Resolves #41 Resolves #21
Codecov Report
Merging #44 (561654d) into master (9af3c3a) will increase coverage by
3.02%
. The diff coverage is86.27%
.
@@ Coverage Diff @@
## master #44 +/- ##
==========================================
+ Coverage 79.88% 82.91% +3.02%
==========================================
Files 4 4
Lines 174 199 +25
==========================================
+ Hits 139 165 +26
+ Misses 35 34 -1
Impacted Files | Coverage Δ | |
---|---|---|
src/tables.jl | 77.77% <0.00%> (ø) |
|
src/param.jl | 85.00% <50.00%> (-1.85%) |
:arrow_down: |
src/model.jl | 82.55% <87.87%> (+4.77%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 9af3c3a...561654d. Read the comment docs.
@rafaqz This is ready for feedback.
The allocation test that is failing on CI passes on my system (Julia 1.6), so it must be a compiler version issue.
@rafaqz Do you have any thoughts on what to do about this version-dependent test failure?
So its allocating on 1.3 but not 1.6?
You can put that test in a version specific conditional, or we can stop supporting 1.3
I like the version-specific conditional idea. Let's see if it works now.
Also, since this changes indexing behavior (although should be non-breaking for most things...?), we should probably bump the minor version.
Finally all green! Let me know when you've looked at it, @rafaqz . I'm not 100% happy with the way it turned out... I think the dispatch graph for setindex
/getindex
could probably be a bit cleaner. But it's probably OK for now so long as it works.
Ok I'll try to have a look tonight.
@rafaqz Do you have any other blocking concerns or can we go ahead and merge this? We could make an issue about potential performance concerns.
Normally we would resolve those two open questions first? One is performance and the other is syntax, which would mean a second breaking change if we change that later.
At the moment, there are no breaking changes. The syntax question is really whether or not it's worth it to make such a breaking change in order to be consistent with other table interfaces. The performance issue is low priority, at least for my use case.
The performance issue will take 5 minutes to fix...
But the general idea is I wouldn't ask that question in a review if I didn't want to discuss it before merging.
It doesn't look that easy to me, but maybe I am missing something. In any case, if you already know how to improve it, please feel free to suggest a solution.
Just use map
instead of a loop and the mixed type tuple should be type stable
No. There is a loop there for a reason. As I pointed out in the other comment, it iteratively reconstructs m
for each column. It's not a map
-like operation. There is almost certainly a better way to do it, but I am quite sure it's not that simple. To use map
, you would need to come up with a way to merge all of the reconstructed Model
s (for each column).
That's a fold.... foldl
should also be type stable - if you use StaticModel
Ok, I'll try foldl
.
It makes it type stable but does not have any substantial impact on performance, at least on this test case that I tried:
m = Model(s1)
xs = Tables.columns([(val=5.0,bounds=nothing),(val=4.0,bounds=nothing)])
idxs = [1,2]
cols = (:val,:bounds)
@btime Base.setindex($m, $xs, $idxs, $cols)
# before: 26.140 μs (404 allocations: 23.39 KiB)
@btime Base.setindex($m, $xs, $idxs, $cols)
# after: 25.525 μs (404 allocations: 23.39 KiB)
Using StaticModel
shaves off a few allocations but still doesn't make a huge difference:
24.250 μs (395 allocations: 22.61 KiB)
It's also worth noting that indexing tables with Tuple
is not standard practice. I don't think Matrix
or DataFrames
support it.
Ah right, maybe we should remove it instead.
Also how is the ttfx for this stuff? Seems that type stability and allocations from it might also take a while to compile.
I just started using this branch in my "real world" use case and it's.... not good. Like almost a full minute to compile on a deeply nested model with about 50 parameters. This is with several calls to setindex
/update
.
Yeah... the type stability is probably more important for that than for runtime speed. Have a look with @snoopi_deep
and Cthulhu.jl, there might be some easy wins.
It's probably going to be reconstruct
:(
I guess we should come back to this at some point.... I am no longer convinced that there is a strong enough use case for type-stable table updates that is worth the insanely high compilation cost. Maybe we should just accept type instability here?