`Data.copy` silently adding `name` attribute
Describe the bug
I found that the Data.copy() mechanism somehow added the name attribute in the json_string after the copy. This affects native compass geometry classes (e.g. Frame) and also classes that I have inherited from data.
The addition of this attribute only happens in the to_jsonstring() but not if I read the __data__. However, it changes the result of sha256() and the copied object will return a differnt hash. (see example code below)
Now, I'm not sure if this is the intended behavior (for version control?). My goal is to use a hash function to compare the data content of the objects. I assumed the sha256() would be for this purpose, but maybe it is not? If not, can you maybe clarify what is the best practice for comparing the data content between two objects, especially when the data contains geometry, str, and list of things.
To Reproduce
The following example shows not only a problem related to the addition of name attribute, but also floating point difference during the copy of the frame. Both of which would throw off the hash comparison.
if __name__ == "__main__":
import compas
print(compas.__version__)
frame = Frame([1, 2, 3], [0.1, 0.2, 0.3])
print(" to_jsonstring(): ")
print(frame.to_jsonstring())
print(frame.copy().to_jsonstring())
print(" __data__: ")
print(frame.__data__)
print(frame.copy().__data__)
print(" sha256(): ")
print(frame.sha256())
print(frame.copy().sha256())
output:
2.4.2
to_jsonstring():
{"dtype": "compas.geometry/Frame", "data": {"point": [1.0, 2.0, 3.0], "xaxis": [0.2672612419124244, 0.5345224838248488, 0.8017837257372731], "yaxis": [-0.16903085094570336, 0.8451542547285167, -0.50709255283711]}, "guid": "fe05d20f-69b4-4bc0-ba25-88e8d1404933"}
{"dtype": "compas.geometry/Frame", "data": {"point": [1.0, 2.0, 3.0], "xaxis": [0.26726124191242445, 0.5345224838248489, 0.8017837257372732], "yaxis": [-0.16903085094570341, 0.8451542547285167, -0.5070925528371101]}, "name": "Frame", "guid": "3a58794d-273c-4285-b875-cbff06e68c34"}
__data__:
{'point': [1.0, 2.0, 3.0], 'xaxis': [0.2672612419124244, 0.5345224838248488, 0.8017837257372731], 'yaxis': [-0.16903085094570336, 0.8451542547285167, -0.50709255283711]}
{'point': [1.0, 2.0, 3.0], 'xaxis': [0.26726124191242445, 0.5345224838248489, 0.8017837257372732], 'yaxis': [-0.16903085094570341, 0.8451542547285167, -0.5070925528371101]}
sha256():
b'\xa7\x1c\xea+U\xd1\xd4\xea%u\xb5\x86+r\x10\xc4\xcb\x13\xc3\xb0\xa3\xb3\xfaK\xc6 Rt\x974\x83\x8e'
b'S\x9f\xe1\xdb9.\xf4\xc6\xb61f\xb8\x87\xee\x16@\x15}\xb1\x1c\xb4\xbff\xdf^5\xe1\xf4\xe3\x1f\xce\xb6'
Expected behavior
I expect the frame.sha256() and frame.copy().sha256() to return the same results.
In general I want a copy mechanism that would actually return me the same object with the same data (I don't know what is the deal about the guid though, perhaps users can have a choice to copy the same guid too) . And that I want to be able to verify the result of that copy using some comparison function. I hope that these two functions would act like a pair, such that I can check my class implementation to make sure I did the __data__ right.
myclass.data_hash() == my_class.some_copy_data_function().data_hash()
yes, the hash method is meant for comparisons but is a bit experimental and not very well tested.
the addition of the name is indeed unintentional and can be easily fixed (will do). what is a bit more difficult to solve is the introduction of small numerical differences, due to subsequent operations like unitized being applied to the input data...
>>> vector = Vector(0.1, 0.2, 0.2)
>>> vector.unitized()
Vector(x=0.2672612419124244, y=0.5345224838248488, z=0.8017837257372731)
>>> Vector(*vector.unitized()).unitized()
Vector(x=0.26726124191242445, y=0.5345224838248489, z=0.8017837257372732)
>>> Vector(*Vector(*vector.unitized()).unitized()).unitized()
Vector(x=0.26726124191242445, y=0.5345224838248489, z=0.8017837257372732)
the first time around, unitized is applied to Vector(0.1, 0.2, 0.3).
the second time it is applied to Vector(x=0.2672612419124244, y=0.5345224838248488, z=0.8017837257372731)
and after that always to Vector(x=0.26726124191242445, y=0.5345224838248489, z=0.8017837257372732)
which is when the number stays completely stable...
perhaps hashing needs to take some kind of tolerance into account...
Cool. Thanks for the quick reply. So I guess I hold back at using hash for comparing geometry for now.
I remember a while ago there was this concept of using geometric keys (some form of string representation and truncation) for comparison. And of course now comparison is much more robust using Tolerance class. I guess, comparison between geometry classes should still rely on the eq functions that can be customised.
The thing about using hash as a comparison kind of implies that it is fast for me. I don't know enough. Perhaps hashing floats is just generally a bad idea.
@tomvanmele this is done right?
yes this is done