AtomsBase.jl
AtomsBase.jl copied to clipboard
Consider extending `==` on `Atom`, `FlexibleSystem` with `StructEquality.jl`?
I was adding support to this package in my code, where in the future, the conversions between FlexibleSystem and my Cell types used in CrystallographyBase.jl and Crystallography.jl is possible.
However, when I was testing this conversion, I found that Atom & FlexibleSystem contains mutable types like Dict, therefore their == are not automatically true since:
Value types are intended for compact, immutable objects. They are stored on the stack, passed by value, and the default hash and equality are based on the literal bits in memory. Record types are allocated on the heap, are passed by reference, and the default hash and equality are based on the pointer value (the data address). When you embed a record type in a value type, then the pointer to the record type becomes part of the value type, and so is included in equality and hash.
Which caused comparisons between these types feeling wierd:
julia> using WhyNotEqual
julia> a = Atom(:H, [0, 0, 1.0]u"bohr")
Atom(H, atomic_number = 1, atomic_mass = 1.008 u):
position : [0,0,1]u"a₀"
julia> b = Atom(:H, [0, 0, 1.0]u"bohr")
Atom(H, atomic_number = 1, atomic_mass = 1.008 u):
position : [0,0,1]u"a₀"
julia> a == b
false
julia> whynot(==, a, b)
DifferentButSameChildren: When applying `lens` to both objects, we get `obj1` and `obj2`.
obj1 and obj2 are different, but their children are all the same.
lens: identity
obj1: Atom(H, [ 0, 0, 1]u"a₀")
obj2: Atom(H, [ 0, 0, 1]u"a₀"
julia> box = [[10.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 10.0]]u"Å" # Note the unit!;
julia> bc = [Periodic(), Periodic(), Periodic()];
julia> hydrogen = FlexibleSystem(
[Atom(:H, [0, 0, 1.0]u"bohr"), Atom(:H, [0, 0, 3.0]u"bohr")], box, bc
);
julia> hydrogen2 = FlexibleSystem(
[Atom(:H, [0, 0, 1.0]u"bohr"), Atom(:H, [0, 0, 3.0]u"bohr")], box, bc
);
julia> hydrogen == hydrogen2
false
julia> using WhyNotEqual
julia> whynot(==, hydrogen, hydrogen2)
DifferentButSameChildren: When applying `lens` to both objects, we get `obj1` and `obj2`.
obj1 and obj2 are different, but their children are all the same.
lens: (@optic _.particles[1])
obj1: Atom(H, [ 0, 0, 1]u"a₀")
obj2: Atom(H, [ 0, 0, 1]u"a₀")
As you can see, their fields are equal, but they are not equal.
This might be surprising when you want to compare them, and this is what I encountered earlier: I have to write more code to do simple tests.
One solution is to use StructEquality.@struct_hash_equal_isequal:
using StructEquality
@struct_hash_equal_isequal struct Atom{D, L<:Unitful.Length, V<:Unitful.Velocity, M<:Unitful.Mass}
position::SVector{D, L}
velocity::SVector{D, V}
atomic_symbol::Symbol
atomic_number::Int
atomic_mass::M
data::Dict{Symbol, Any} # Store arbitrary data about the atom.
end
@struct_hash_equal_isequal struct FlexibleSystem{D,S,L<:Unitful.Length} <: AbstractSystem{D}
particles::AbstractVector{S}
bounding_box::SVector{D,SVector{D,L}}
boundary_conditions::SVector{D,BoundaryCondition}
data::Dict{Symbol,Any} # Store arbitrary data about the atom.
end
Then the above tests will all be trues.
In fact, I use @struct_hash_equal_isequal in all my packages:
@struct_hash_equal_isequal struct SpglibCell{L,P,T,M} <: AbstractCell
lattice::Lattice{L}
positions::Vector{MVector{3,P}}
atoms::Vector{T}
magmoms::Vector{M}
end
Similar packages are:
I think this is a very reasonable suggestion. If other folks agree (e.g. @mfherbst, @cortner, @jgreener64), maybe you could make a PR to this effect?
I don't have a strong view. StructEquality looks useful but I have never used it myself so can't really comment how practical it is. I'm happy for this to go ahead .
Hi @rkurchin, sure. I could make a PR to test this change.
Sounds like a good quick solution to get a solid behaviour.
One edge case I can think of where one should think a second of the desired behaviour is the case where two atoms/systems only differ by the extra attributes, which are stored, e.g. one atom might store some extra information (e.g. a tag or a property relevant only to an implementing library), which from the AtomsBase point of view is not relevant. Should that then compare equal ? The safe choice is probably no, but I can see cases where one might argue yes as well. Anyone has strong feelings about this?
Sounds okay to me.
I would err on the side of safety, I would be surprised if two things were equal when something was different. The user shouldn't have to worry too much about which bits AtomsBase cares about.
@mfherbst Good point. We will store training data in structures. Not sure how else to do this. Can there be a data dict which is excluded in the == ?
Btw for testing equality in tests there is also the AtomsBaseTesting package with some utilities.
Could also ≈ be added in a reasonable way?
StructEquality.jl provides @struct_isapprox, so I think yes?
I'd love to have this PR, I'm current hand-writing equality and ≈ tests.
Sorry I was pretty busy before, but recently I think I will have time to do this.