thriftpy2 icon indicating copy to clipboard operation
thriftpy2 copied to clipboard

Make `TPayload` hashable

Open Fluxx opened this issue 2 months ago • 2 comments

We've been using Thriftpy2 successfully for a little while, however have run into what we consider a bug.

When a TPayload (which I understand to be a struct, union, or exception) is used as the key inside a Thrift map, the binary protocol will attempt to store that TPayload value as the key in a dict. Because TPayload.__hash__ is set to None, this makes TPayload instances unhashable, and so they can't be used for dict keys and instead throw this exception:

TypeError: unhashable type: 'Person'

Now TPayload.__hash__ is set to None was done intentionally in https://github.com/thriftpy/thriftpy2/commit/daa213d719743bf2e0f1db0d972ee622bc69102c apparently to fix https://github.com/Thriftpy/thriftpy2/pull/184. There are skim details on this motivation and that particular issue, but later TException was made hashable in https://github.com/thriftpy/thriftpy2/commit/40219d4ac57c86916d8bbdcf1b1aeb63ad2f4b4b, again with skim details.

Given structs and unions are viable Thrift map keys, I would consider the current behavior a bug. The most direct fix is just to restore the old TPayload.__hash__ logic. That makes TPayload hashable, but I am not sure if this is a good idea as it was purposefully made unhashable. Though that fix was from 2016, nearly 10 years ago, and so I am wondering if the fix is still needed?

The less direct way to fix this is to have the binary protocol (and also the compact protocol) read a map not into a vanilla Python dict, but into some custom type that implements __contains__ and is otherwise duck-typed like a dict, but does not hash(key) if key is a TPayload, instead doing some other thing.

This PR implements the direct fix, but again let me know that is a bad idea. If so I can do the less direct fix, or another fix you think is more appropriate. Happy to fixup this PR to implement whatever.

Fluxx avatar Oct 16 '25 21:10 Fluxx

Hi @Fluxx, thank you for your report and fix, and sorry for the late reply.

The original issue for the https://github.com/Thriftpy/thriftpy2/commit/daa213d719743bf2e0f1db0d972ee622bc69102c commit is in our old repo: https://github.com/Thriftpy/thriftpy/issues/184.

In that issue, we can see the real problem was that we had defined a custom __eq__ method on TPayload that compares all the values in the struct. The default __hash__ (compare by id, in other words, memory address) doesn't follow the custom __eq__ method, so we should either update the __hash__ method to follow the __eq__ method, or simply disable the __hash__ method (so the current implmentation).

So I think just adding the __hash__ method back directly may not be the right solution. I just want to know, according to Thrift's specification or the Apache implementation, what is the right behavior when TPayload is used as a key for a map? Will a struct of the same type with all the same field values overwrite the key? If this is well-defined, I think we can follow that specification for our TPayload.__hash__ implementation.

aisk avatar Oct 30 '25 15:10 aisk

@aisk thanks for the reply. I was off work for a bit, but am now back. I will review this today.

Fluxx avatar Nov 07 '25 16:11 Fluxx