ConstructionBase.jl icon indicating copy to clipboard operation
ConstructionBase.jl copied to clipboard

Use static property names type

Open Tokazama opened this issue 3 years ago • 20 comments

I was trying to put something together for Accessors.jl and found that some changes here can really improve performance downstream.

The strategy used here is to only depend on PropertyNames{fieldnames(obj)}() to explicitly inject compile time information. Only using PropertyNames{names} to construct the body of generated methods is more reliable than generic ::Type{T}.

Using the example from this isssue demonstrates some of the performance differences here.

Where A and B are immutable types and B has a field with a union, this is what I get on the current master branch.

julia> @btime update($(Ref(A(1, 2)))[]);
  1.471 ns (0 allocations: 0 bytes)

julia> @btime update($(Ref(B(1, 2)))[]);
  387.926 ns (6 allocations: 176 bytes)

Problems with the union field are mostly absent using this PR.

julia> @btime update($(Ref(A(1, 2)))[]);
  1.760 ns (0 allocations: 0 bytes)

julia> @btime update($(Ref(B(1, 2)))[]);
  8.819 ns (0 allocations: 0 bytes)

Explicitly defining PropertyNames helps gives a tiny boost to performance.

ConstructionBase.PropertyNames(::Type{A}) = ConstructionBase.PropertyNames{(:x, :y)}()
ConstructionBase.PropertyNames(::Type{B}) = ConstructionBase.PropertyNames{(:x, :y)}()

julia> @btime update($(Ref(A(1, 2)))[]);
  1.471 ns (0 allocations: 0 bytes)

julia> @btime update($(Ref(B(1, 2)))[]);
  8.034 ns (0 allocations: 0 bytes)
``.`

This approach also allows manually unrolling operations so bigger collections get big performance boosts.
```julia
x = ntuple(identity, 40);
y = ntuple(identity, 30);

# master
@btime setproperties($x, $y); # 10.250 μs (10 allocations: 3.00 KiB)

# this PR
@btime setproperties($x, $y); # 3.984 ns (0 allocations: 0 bytes)

It might be worth noting that another internal change to setproperties was removing use of getproperties. PropertyNames provides the information we really need from getproperties for reconstruction, which in turn helps us avoid needless getproperty calls.

I haven't changed any tests or documented anything here yet because I thought there might be some input on how well this really fits this package or if significant amount needs to be changed.

Tokazama avatar Apr 29 '22 01:04 Tokazama

Wow @tokazama this looks super exciting! I will play around with this PR and review beginning of next week.

jw3126 avatar Apr 29 '22 09:04 jw3126

Sounds good. I was a bit worried because I ended up changing more than I originally planned to the source code, but none of the tests have been changed so it shouldn't break anything. Theoretically, we could say that the only methods related to this package that should ever be overloaded are constructorof, PropertyNames, and getproperty. I don't think there's a situation where users would get better performance by defining a unique getproperties or setproperties method given these other definitions, but I didn't want to overstep my bounds here.

I haven't taken the time to ensure that start up times are good but the use of @generated here is so minimal I don't think it will be an issue.

Tokazama avatar Apr 29 '22 10:04 Tokazama

Theoretically, we could say that the only methods related to this package that should ever be overloaded are constructorof, PropertyNames, and getproperty

One usecase that comes up for me from time to time is like this. I have a file containing a huge vector of objects as "packed bits" and I want to mmap it. For this I do something like:

struct MyObject
    data::NTuple{17, UInt8}
end
function ConstructionBase.getproperties(o::MyObject)
    ...
end
function ConstructionBase.setproperties(o::MyObject, patch::NamedTuple)
    ...
end

jw3126 avatar May 02 '22 07:05 jw3126

Theoretically, we could say that the only methods related to this package that should ever be overloaded are constructorof, PropertyNames, and getproperty

One usecase that comes up for me from time to time is like this. I have a file containing a huge vector of objects as "packed bits" and I want to mmap it. For this I do something like:

struct MyObject
    data::NTuple{17, UInt8}
end
function ConstructionBase.getproperties(o::MyObject)
    ...
end
function ConstructionBase.setproperties(o::MyObject, patch::NamedTuple)
    ...
end

Are you manually providing different property names for different types represented along the bits of data?

Tokazama avatar May 02 '22 09:05 Tokazama

In my use cases, the property names are statically known and hard coded in the getproperties / setproperties functions. Here is a simple and complex example.

jw3126 avatar May 02 '22 10:05 jw3126

Is the problem there that you aren't constructing Latch with positional arguments?

Tokazama avatar May 02 '22 10:05 Tokazama

Yes, I would prefer not to add a positional constructor like Latch(multicross,charge,creation,visited,brems). Generally we would like to use getproperties/setproperties for fancy custom properties and constructorof for construction from raw possibly private struct data. See also this discussion: https://github.com/JuliaObjects/ConstructionBase.jl/issues/53

jw3126 avatar May 02 '22 12:05 jw3126

With the current code it works like this

function setproperties_object(obj, patch)
    nt = getproperties(obj)
    nt_new = merge(nt, patch)
    validate_setproperties_result(nt_new, nt, obj, patch)
    constructorof(typeof(obj))(Tuple(nt_new)...)
end

Is the assumption that if one defines a custom getproperties then they also define a custom setproperties? It doesn't really change the core aspect of what's here but I want to make sure I understand so I can properly document things around what I've done.

It's a bit orthogonal to this PR, but would it make sense to have the following:

  • Property based
    • construct_from_properties
    • getproperties
    • setproperties
  • Field based
    • constructorof
    • getfields
    • setfields

And the field based ones would be completely determined by the type system

Tokazama avatar May 02 '22 16:05 Tokazama

Is the assumption that if one defines a custom getproperties then they also define a custom setproperties?

Yes, I think one should always overload both or none.

It's a bit orthogonal to this PR, but would it make sense to have the following:

Yes I like that and it is pretty similar to what another discussion converged to https://github.com/JuliaObjects/ConstructionBase.jl/pull/54#issuecomment-1088765017

Ok back to this PR. Here is my understanding:

  • This PR is not breaking and provides strictly better inference. Packages that overloaded constructorof and/or setproperties+getproperties will continue to work after this PR.
  • We could decide to add PropertyNames to the public API if we like or leave it private for now. I think I would prefer to first merge this without API changes/additions and if we want to make PropertyNames public do that in a follow-up PR.

Do you agree with that?

jw3126 avatar May 02 '22 17:05 jw3126

Sounds good to me. Would you like me to put a doc strings in here. I'm not sure how you want it structured

Tokazama avatar May 02 '22 18:05 Tokazama

Docstrings for private functions are not mandatory, but feel free to provide some. I guess supporting julia 1.0 will be painful, but it would be nice if we can support 1.6. Also, some inference tests that pass with this PR but not without it would be nice.

jw3126 avatar May 02 '22 18:05 jw3126

We have some really long docstrings in .md files, but short docstrings can be placed in .jl files IMO.

jw3126 avatar May 02 '22 18:05 jw3126

Should PropertyNames be part of the public API because it can be passed to getproperties or or do we not want to actually support that feature?

Tokazama avatar May 02 '22 21:05 Tokazama

I think we should not make getproperties(obj, ::PropertyNames) public. The reason is that it makes overloading of getproperties harder for users.

jw3126 avatar May 04 '22 07:05 jw3126

CI segfaults https://github.com/JuliaObjects/ConstructionBase.jl/runs/6290163099?check_suite_focus=true#step:5:68

Locally tests pass for me on 1.7

jw3126 avatar May 04 '22 13:05 jw3126

Passes locally on 1.7 and 1.8 for me.

Tokazama avatar May 04 '22 13:05 Tokazama

Maybe the segfault is related to @pure? Maybe you can restore the old @generated definition of constructorof in case @assume_effects is not available?

jw3126 avatar May 06 '22 07:05 jw3126

Codecov Report

Merging #56 (9643a02) into master (2044dd5) will decrease coverage by 6.05%. The diff coverage is 88.13%.

@@            Coverage Diff             @@
##           master      #56      +/-   ##
==========================================
- Coverage   98.05%   92.00%   -6.06%     
==========================================
  Files           3        3              
  Lines         103      100       -3     
==========================================
- Hits          101       92       -9     
- Misses          2        8       +6     
Impacted Files Coverage Δ
src/ConstructionBase.jl 88.05% <88.13%> (-9.13%) :arrow_down:
src/functions.jl 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2044dd5...9643a02. Read the comment docs.

codecov-commenter avatar May 10 '22 13:05 codecov-commenter

Any updates on this @Tokazama? looks interesting, apologies for missing it in april

rafaqz avatar Oct 23 '22 13:10 rafaqz

I kind of forgot about it. I think there are some inference failures on 1.6 still.

I'd be happy to come back to it if people are still interested though.

Tokazama avatar Oct 23 '22 16:10 Tokazama

I think the inference issues addressed by this PR no longer exist. Feel free to reopen otherwise.

jw3126 avatar Jul 17 '23 18:07 jw3126