scalajson icon indicating copy to clipboard operation
scalajson copied to clipboard

Change Map to a VectorMap implementation

Open mdedetrich opened this issue 8 years ago • 8 comments

Not 100% decided on this, here are the tradeoffs (and finally to remove confusion).

In javascript/ruby (and other mutable by default languages), the HashMap structure preserves ordering on insertion. In Scala, the immutable Map preserves insertion ordering on creation (via the usage of .newBuilder or by direct constructor) however if you use the various immutable copy with new value methods (i.e. +/-) then there is no guarantee of ordering. This means that in typical Scala use, we do maintain ordering because very few people in practise update + copy the immutable map in a way thats noticable in regards to JSON serialization.

Using immutable Map

Pros

  • Heavily tested
  • Good performance characteristics in all cases
  • Less memory usage
  • Key ordering issue not usually a problem due to how typical JSON serializers/deserializers work
  • Immutable

Cons

  • Not the same behaviour as expected in regards to preserving key ordering in all cases

Using VectorMap (see https://github.com/mdedetrich/scalajson/pull/19)

Pros

  • Adheres to the same behaviour in Javascript regarding insertion for all cases
  • Immutable
  • Decent performance characteristics for typical cases (insertion/lookup)

Cons

  • Not battle tested
  • Performance of certain operations (i.e. removal of key)
  • Memory usage (maintains two datastructures at once)

Using mutable LinkedHashMap

Pros

  • Good performance in all cases
  • Maintains key order in insertion cases
  • Battle tested
  • Low memory usage

Cons

  • Mutable

@Ichoran your thoughts?

mdedetrich avatar Jun 30 '17 15:06 mdedetrich

I think it really depends how much slower VectorMap is than a normal immutable map. In general you might have a lot of JSON objects, so anything that is too heavyweight should probably be rejected. What does Circe do?

Ichoran avatar Jun 30 '17 16:06 Ichoran

@Ichoran Circe uses a normal map according to @travisbrown. Argonaut, which is scalaz's version of Circe, uses this implementation (see https://github.com/argonaut-io/argonaut/blob/master/argonaut/shared/src/main/scala/argonaut/JsonObject.scala). I will add some benchmarks and do some more work to see what the tradeoffs are. With specialization of classes the slowdown in typical cases shouldn't be that high

mdedetrich avatar Jun 30 '17 17:06 mdedetrich

For what it's worth I've been working on a branch in circe where I use java.util.LinkedHashMap by default and switch to the Map + IndexedSeq representation when someone uses add or remove, and it seems to speed things up significantly.

travisbrown avatar Jun 30 '17 17:06 travisbrown

@travisbrown Mind linking the branch? Also how do you deal with concurrency/threading issues (or I assume you just hide it internally and its not exposed). This approach also uses the Map + IndexedSeq approach, however I use specializing for classes up to N to speed it up for typical uses cases (i.e. objects smaller than n)

mdedetrich avatar Jun 30 '17 17:06 mdedetrich

It's still a mess and I haven't pushed it yet. But yeah, the LinkedHashMap is never exposed—it's just used in the cases where you're building up a map from a bunch of key-value pairs all at once.

travisbrown avatar Jun 30 '17 17:06 travisbrown

We use LinkedHashMap in play-json but only expose Map. Thus it is possible to use a more efficient Map implementation if you don't care about ordering.

If this library intends to be the API actually used by end-users, it would be nice if it provided that flexibility.

gmethvin avatar Jul 01 '17 07:07 gmethvin

@gmethvin I think the goal is going to be that we will see how VectorMap works out. If its very competitive with Map then this implementation is whats going to be used, else we will do similar to what PlayJson does.

mdedetrich avatar Jul 01 '17 10:07 mdedetrich

@Ichoran I have made a seperate repo for the vector map (now called LinkedMap) implementation at https://github.com/mdedetrich/linked-map. With some very rudimentary benchmarks, it appears that LinkedMap is a fair bit slower than a standard Map (see https://github.com/mdedetrich/linked-map/issues/2 for some workarounds)

mdedetrich avatar Nov 14 '17 13:11 mdedetrich