smithy4s icon indicating copy to clipboard operation
smithy4s copied to clipboard

Expensive hashCode on Schema prevents fast cache lookups on deeply nested schemas

Open kubukoz opened this issue 10 months ago • 2 comments

This was discussed briefly on Discord, but currently the usage of things like auto-derived Document.Decoder can be a footgun - in order to use the cache, we need to compute the hashCode of the schema being used. This is a problem, because for deeply nested schemas it can take a bit of time to traverse the entire thing, and it's not memoized in any way.

https://discord.com/channels/1045676621761347615/1045676622491164724/1336044966337843301

Here's one of my flame graphs:

Image

Some of the ideas for improvement:

  • somehow use object identity as the cache key before resorting to hashCode (find a concurrent version of IdentityHashMap?)
  • use lazy val to store the hashcode in schemas
  • additionally, maybe use ConcurrentHashMap instead of TrieMap in the JVM case. The code is already platform-specific. This change would require benchmarking against the current state.

kubukoz avatar Feb 21 '25 17:02 kubukoz

additionally, maybe use ConcurrentHashMap instead of TrieMap in the JVM case. The code is already platform-specific. This change would require benchmarking against the current state.

Huh, I thought we already were. Well let's try that then ^^

Baccata avatar Feb 24 '25 09:02 Baccata

That doesn't really help us if the hash computation itself is heavy :/

JVM - TrieMap https://github.com/disneystreaming/smithy4s/blob/series/0.18/modules/core/src-jvm/smithy4s/internals/maps.scala#L23

JS/Native - mutable.Map https://github.com/disneystreaming/smithy4s/blob/series/0.18/modules/core/src-js-native/smithy4s/internals/maps.scala#L23. I guess we'll have to change the Native one soon because multi-threading is now a thing.

kubukoz avatar Feb 24 '25 13:02 kubukoz