prost icon indicating copy to clipboard operation
prost copied to clipboard

[Featture-Request] Ability to swap HashMap

Open The-Mr-L opened this issue 2 years ago • 5 comments

hi, I think it would be great if there is a way to swap hashmap with lets say rustc_hash::FxHashMap for better performance, but I dont know how difficult it would be to impl for currnet code structure, but rustc_hash::FxHashMap should be a drop and replace for std::collections::HashMap. would this be considered a feature going forward? thanks

The-Mr-L avatar Jun 12 '22 17:06 The-Mr-L

and wonderful #666 ;)

The-Mr-L avatar Jun 12 '22 17:06 The-Mr-L

This seems like an interesting idea similar to how prost supports bytes. That said, I am not sure if it is a good idea to add this if the change is quite intrusive like the bytes one. I want to avoid the precedent of adding a bunch of stuff. That said, you can always derive stuff manually using prost and can likely use your own hashmap.

LucioFranco avatar Jun 15 '22 15:06 LucioFranco

got it :)

The-Mr-L avatar Jun 15 '22 18:06 The-Mr-L

can likely use your own hashmap.

can you please explain this :) how would I go about using my own hashmap. thanks

The-Mr-L avatar Jun 16 '22 12:06 The-Mr-L

I have had similar performance concerns and got this answer on SO: https://stackoverflow.com/a/76414640/3008606

Main points stated in that thread:

  • optimization depends on the individual HashMap field -- some may be empty, some are known in advance to have single entries, others have multiple entries. Some have hardcoded contents, others are determined dynamically.
  • optimization for sending data is likely different than for retrieving data.
  • creating a RandomState is relatively cheap, though for a large number of throwaway HashMaps may still be relevant
  • the same is true for computing a single hash code
  • memory allocation is presumably orders of magnitude worse. A different allocator might be used. Using an allocation-free HashMap alternative is not possible without introducing the same type problems as for the hashing algorithm.

Optimizing each field individually would require to introduce per-field type parameters for either the hasher or the whole HashMap, and these parameters would cascade through enclosing structs. This is not useful. Optimizing all fields with the same time could at best aim at a common-denominator solution.

The "Java way" would be to add a common Map trait for the hash maps, and use Box<dyn Map> for all fields. This introduces other overheads for the dynamic dispatch, and possibly further memory allocation.

However, the last point also raises the question, what methods would be in that trait. What exactly is needed from maps during Protobuf de-/serialization? My point here is that hashing often isn't needed. The feature that comes closest and might be considered a requirement is duplicate key detection. For anything else, a list-of-key-value-pairs would be enough in the generated types, and true hash maps can be created by the calling code when needed. In many cases, the calling code will translate from its own types to/from the protobuf types anyway, since the protobuf types are targeted at compatibility with external programs, not what is needed inside the calling code.

A list-of-key-value-pairs would still not be able to optimize each field individually, but a common-denominator solution would be much cheaper than for HashMaps. Examples would be allocation-free for empty key/value sets, allocate for non-empty. Or even allocation-free for empty or single-entry KV sets.

MartinGeisse avatar Jun 07 '23 04:06 MartinGeisse