AtomsBase.jl icon indicating copy to clipboard operation
AtomsBase.jl copied to clipboard

Consider extending `==` on `Atom`, `FlexibleSystem` with `StructEquality.jl`?

Open singularitti opened this issue 2 years ago • 11 comments

I was adding support to this package in my code, where in the future, the conversions between FlexibleSystem and my Cell types used in CrystallographyBase.jl and Crystallography.jl is possible.

However, when I was testing this conversion, I found that Atom & FlexibleSystem contains mutable types like Dict, therefore their == are not automatically true since:

Value types are intended for compact, immutable objects. They are stored on the stack, passed by value, and the default hash and equality are based on the literal bits in memory. Record types are allocated on the heap, are passed by reference, and the default hash and equality are based on the pointer value (the data address). When you embed a record type in a value type, then the pointer to the record type becomes part of the value type, and so is included in equality and hash.

Which caused comparisons between these types feeling wierd:

julia> using WhyNotEqual

julia> a = Atom(:H, [0, 0, 1.0]u"bohr")
Atom(H, atomic_number = 1, atomic_mass = 1.008 u):
    position          : [0,0,1]u"a₀"


julia> b = Atom(:H, [0, 0, 1.0]u"bohr")
Atom(H, atomic_number = 1, atomic_mass = 1.008 u):
    position          : [0,0,1]u"a₀"


julia> a == b
false

julia> whynot(==, a, b)
DifferentButSameChildren: When applying `lens` to both objects, we get `obj1` and `obj2`.
obj1 and obj2 are different, but their children are all the same.
lens: identity
obj1: Atom(H,  [       0,        0,        1]u"a₀")
obj2: Atom(H,  [       0,        0,        1]u"a₀"

julia> box = [[10.0, 0.0, 0.0], [0.0, 10.0, 0.0], [0.0, 0.0, 10.0]]u"Å"  # Note the unit!;

julia> bc = [Periodic(), Periodic(), Periodic()];

julia> hydrogen = FlexibleSystem(
           [Atom(:H, [0, 0, 1.0]u"bohr"), Atom(:H, [0, 0, 3.0]u"bohr")], box, bc
       );

julia> hydrogen2 = FlexibleSystem(
           [Atom(:H, [0, 0, 1.0]u"bohr"), Atom(:H, [0, 0, 3.0]u"bohr")], box, bc
       );

julia> hydrogen == hydrogen2
false

julia> using WhyNotEqual

julia> whynot(==, hydrogen, hydrogen2)
DifferentButSameChildren: When applying `lens` to both objects, we get `obj1` and `obj2`.
obj1 and obj2 are different, but their children are all the same.
lens: (@optic _.particles[1])
obj1: Atom(H,  [       0,        0,        1]u"a₀")
obj2: Atom(H,  [       0,        0,        1]u"a₀")

As you can see, their fields are equal, but they are not equal.

This might be surprising when you want to compare them, and this is what I encountered earlier: I have to write more code to do simple tests.

One solution is to use StructEquality.@struct_hash_equal_isequal:

using StructEquality

@struct_hash_equal_isequal struct Atom{D, L<:Unitful.Length, V<:Unitful.Velocity, M<:Unitful.Mass}
    position::SVector{D, L}
    velocity::SVector{D, V}
    atomic_symbol::Symbol
    atomic_number::Int
    atomic_mass::M
    data::Dict{Symbol, Any}  # Store arbitrary data about the atom.
end

@struct_hash_equal_isequal struct FlexibleSystem{D,S,L<:Unitful.Length} <: AbstractSystem{D}
    particles::AbstractVector{S}
    bounding_box::SVector{D,SVector{D,L}}
    boundary_conditions::SVector{D,BoundaryCondition}
    data::Dict{Symbol,Any}  # Store arbitrary data about the atom.
end

Then the above tests will all be trues.

In fact, I use @struct_hash_equal_isequal in all my packages:

@struct_hash_equal_isequal struct SpglibCell{L,P,T,M} <: AbstractCell
    lattice::Lattice{L}
    positions::Vector{MVector{3,P}}
    atoms::Vector{T}
    magmoms::Vector{M}
end

Similar packages are:

singularitti avatar Oct 18 '23 22:10 singularitti

I think this is a very reasonable suggestion. If other folks agree (e.g. @mfherbst, @cortner, @jgreener64), maybe you could make a PR to this effect?

rkurchin avatar Oct 23 '23 15:10 rkurchin

I don't have a strong view. StructEquality looks useful but I have never used it myself so can't really comment how practical it is. I'm happy for this to go ahead .

cortner avatar Oct 23 '23 17:10 cortner

Hi @rkurchin, sure. I could make a PR to test this change.

singularitti avatar Oct 23 '23 17:10 singularitti

Sounds like a good quick solution to get a solid behaviour.

One edge case I can think of where one should think a second of the desired behaviour is the case where two atoms/systems only differ by the extra attributes, which are stored, e.g. one atom might store some extra information (e.g. a tag or a property relevant only to an implementing library), which from the AtomsBase point of view is not relevant. Should that then compare equal ? The safe choice is probably no, but I can see cases where one might argue yes as well. Anyone has strong feelings about this?

mfherbst avatar Oct 24 '23 07:10 mfherbst

Sounds okay to me.

I would err on the side of safety, I would be surprised if two things were equal when something was different. The user shouldn't have to worry too much about which bits AtomsBase cares about.

jgreener64 avatar Oct 24 '23 11:10 jgreener64

@mfherbst Good point. We will store training data in structures. Not sure how else to do this. Can there be a data dict which is excluded in the == ?

cortner avatar Oct 24 '23 14:10 cortner

Btw for testing equality in tests there is also the AtomsBaseTesting package with some utilities.

mfherbst avatar Oct 25 '23 07:10 mfherbst

Could also ≈ be added in a reasonable way?

cortner avatar May 15 '24 02:05 cortner

StructEquality.jl provides @struct_isapprox, so I think yes?

singularitti avatar May 15 '24 03:05 singularitti

I'd love to have this PR, I'm current hand-writing equality and ≈ tests.

cortner avatar May 15 '24 03:05 cortner

Sorry I was pretty busy before, but recently I think I will have time to do this.

singularitti avatar May 15 '24 04:05 singularitti