ConstructionBase.jl
ConstructionBase.jl copied to clipboard
Use static property names type
I was trying to put something together for Accessors.jl and found that some changes here can really improve performance downstream.
The strategy used here is to only depend on PropertyNames{fieldnames(obj)}() to explicitly inject compile time information.
Only using PropertyNames{names} to construct the body of generated methods is more reliable than generic ::Type{T}.
Using the example from this isssue demonstrates some of the performance differences here.
Where A and B are immutable types and B has a field with a union, this is what I get on the current master branch.
julia> @btime update($(Ref(A(1, 2)))[]);
1.471 ns (0 allocations: 0 bytes)
julia> @btime update($(Ref(B(1, 2)))[]);
387.926 ns (6 allocations: 176 bytes)
Problems with the union field are mostly absent using this PR.
julia> @btime update($(Ref(A(1, 2)))[]);
1.760 ns (0 allocations: 0 bytes)
julia> @btime update($(Ref(B(1, 2)))[]);
8.819 ns (0 allocations: 0 bytes)
Explicitly defining PropertyNames helps gives a tiny boost to performance.
ConstructionBase.PropertyNames(::Type{A}) = ConstructionBase.PropertyNames{(:x, :y)}()
ConstructionBase.PropertyNames(::Type{B}) = ConstructionBase.PropertyNames{(:x, :y)}()
julia> @btime update($(Ref(A(1, 2)))[]);
1.471 ns (0 allocations: 0 bytes)
julia> @btime update($(Ref(B(1, 2)))[]);
8.034 ns (0 allocations: 0 bytes)
``.`
This approach also allows manually unrolling operations so bigger collections get big performance boosts.
```julia
x = ntuple(identity, 40);
y = ntuple(identity, 30);
# master
@btime setproperties($x, $y); # 10.250 μs (10 allocations: 3.00 KiB)
# this PR
@btime setproperties($x, $y); # 3.984 ns (0 allocations: 0 bytes)
It might be worth noting that another internal change to setproperties was removing use of getproperties.
PropertyNames provides the information we really need from getproperties for reconstruction, which in turn helps us avoid needless getproperty calls.
I haven't changed any tests or documented anything here yet because I thought there might be some input on how well this really fits this package or if significant amount needs to be changed.
Wow @tokazama this looks super exciting! I will play around with this PR and review beginning of next week.
Sounds good. I was a bit worried because I ended up changing more than I originally planned to the source code, but none of the tests have been changed so it shouldn't break anything. Theoretically, we could say that the only methods related to this package that should ever be overloaded are constructorof, PropertyNames, and getproperty. I don't think there's a situation where users would get better performance by defining a unique getproperties or setproperties method given these other definitions, but I didn't want to overstep my bounds here.
I haven't taken the time to ensure that start up times are good but the use of @generated here is so minimal I don't think it will be an issue.
Theoretically, we could say that the only methods related to this package that should ever be overloaded are constructorof, PropertyNames, and getproperty
One usecase that comes up for me from time to time is like this. I have a file containing a huge vector of objects as "packed bits" and I want to mmap it. For this I do something like:
struct MyObject
data::NTuple{17, UInt8}
end
function ConstructionBase.getproperties(o::MyObject)
...
end
function ConstructionBase.setproperties(o::MyObject, patch::NamedTuple)
...
end
Theoretically, we could say that the only methods related to this package that should ever be overloaded are constructorof, PropertyNames, and getproperty
One usecase that comes up for me from time to time is like this. I have a file containing a huge vector of objects as "packed bits" and I want to mmap it. For this I do something like:
struct MyObject data::NTuple{17, UInt8} end function ConstructionBase.getproperties(o::MyObject) ... end function ConstructionBase.setproperties(o::MyObject, patch::NamedTuple) ... end
Are you manually providing different property names for different types represented along the bits of data?
In my use cases, the property names are statically known and hard coded in the getproperties / setproperties functions.
Here is a simple and complex example.
Is the problem there that you aren't constructing Latch with positional arguments?
Yes, I would prefer not to add a positional constructor like Latch(multicross,charge,creation,visited,brems). Generally we would like to use getproperties/setproperties for fancy custom properties and constructorof for construction from raw possibly private struct data. See also this discussion: https://github.com/JuliaObjects/ConstructionBase.jl/issues/53
With the current code it works like this
function setproperties_object(obj, patch)
nt = getproperties(obj)
nt_new = merge(nt, patch)
validate_setproperties_result(nt_new, nt, obj, patch)
constructorof(typeof(obj))(Tuple(nt_new)...)
end
Is the assumption that if one defines a custom getproperties then they also define a custom setproperties? It doesn't really change the core aspect of what's here but I want to make sure I understand so I can properly document things around what I've done.
It's a bit orthogonal to this PR, but would it make sense to have the following:
- Property based
construct_from_propertiesgetpropertiessetproperties
- Field based
constructorofgetfieldssetfields
And the field based ones would be completely determined by the type system
Is the assumption that if one defines a custom getproperties then they also define a custom setproperties?
Yes, I think one should always overload both or none.
It's a bit orthogonal to this PR, but would it make sense to have the following:
Yes I like that and it is pretty similar to what another discussion converged to https://github.com/JuliaObjects/ConstructionBase.jl/pull/54#issuecomment-1088765017
Ok back to this PR. Here is my understanding:
- This PR is not breaking and provides strictly better inference. Packages that overloaded
constructorofand/orsetproperties+getpropertieswill continue to work after this PR. - We could decide to add
PropertyNamesto the public API if we like or leave it private for now. I think I would prefer to first merge this without API changes/additions and if we want to makePropertyNamespublic do that in a follow-up PR.
Do you agree with that?
Sounds good to me. Would you like me to put a doc strings in here. I'm not sure how you want it structured
Docstrings for private functions are not mandatory, but feel free to provide some. I guess supporting julia 1.0 will be painful, but it would be nice if we can support 1.6. Also, some inference tests that pass with this PR but not without it would be nice.
We have some really long docstrings in .md files, but short docstrings can be placed in .jl files IMO.
Should PropertyNames be part of the public API because it can be passed to getproperties or or do we not want to actually support that feature?
I think we should not make getproperties(obj, ::PropertyNames) public. The reason is that it makes overloading of getproperties harder for users.
CI segfaults https://github.com/JuliaObjects/ConstructionBase.jl/runs/6290163099?check_suite_focus=true#step:5:68
Locally tests pass for me on 1.7
Passes locally on 1.7 and 1.8 for me.
Maybe the segfault is related to @pure? Maybe you can restore the old @generated definition of constructorof in case @assume_effects is not available?
Codecov Report
Merging #56 (9643a02) into master (2044dd5) will decrease coverage by
6.05%. The diff coverage is88.13%.
@@ Coverage Diff @@
## master #56 +/- ##
==========================================
- Coverage 98.05% 92.00% -6.06%
==========================================
Files 3 3
Lines 103 100 -3
==========================================
- Hits 101 92 -9
- Misses 2 8 +6
| Impacted Files | Coverage Δ | |
|---|---|---|
| src/ConstructionBase.jl | 88.05% <88.13%> (-9.13%) |
:arrow_down: |
| src/functions.jl | 100.00% <0.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 2044dd5...9643a02. Read the comment docs.
Any updates on this @Tokazama? looks interesting, apologies for missing it in april
I kind of forgot about it. I think there are some inference failures on 1.6 still.
I'd be happy to come back to it if people are still interested though.
I think the inference issues addressed by this PR no longer exist. Feel free to reopen otherwise.