parser icon indicating copy to clipboard operation
parser copied to clipboard

Use of Set for dictionaries

Open missinglink opened this issue 6 years ago • 6 comments

Initially, I used a js object and hasOwnProperty to do the hashmap lookups and then later used Set and has().

It would be nice to standardize this, I'm just not familiar with the performance of Set vs. Object, I think if Set is faster/the same then we should use it.

I think one benefit of Set is that Object can possibly have issues with numeric keys?

cc/ @Joxit thoughts?

missinglink avatar May 13 '19 09:05 missinglink

I did some quick performance testing and it seems that Set performs better and the performance is more linear as the size of the dictionary increases:

https://jsperf.com/set-vs-object-as-sets/15

missinglink avatar May 13 '19 09:05 missinglink

I also did some quick perf (Node v8.16.0) testing and it found that :

  • Memory print : Set uses less memory than Object (for localities Set = 59.9MB and Object = 61MB)
  • ops/sec : obj[value] faster than set.has(value) faster than obj.hasOwnProperty(value) :thinking:

That's strange, in your test obj[value] seems to be the slowest.

Joxit avatar May 13 '19 12:05 Joxit

I think it's because those benchmarks (which I copied from someone else) also have a value check ( ^ 0 or !!) which means they are doing two operations.

The problem with something like obj[key] is that it returns false for falsy values such as 0 (although in this case they are all strings so it probably doesn't matter).

It looks like they are pretty similar so let's just go all-in on Set? I like that it doesn't coerse the keys to strings and all the other things that Object does which are weird.

missinglink avatar May 13 '19 12:05 missinglink

For the prefix checks, I will write a little FST memory structure which will make those much faster, in the meantime they can just use iterators and it will be slow.

Performance overall is pretty good, although some complex queries are getting near 10ms which is not great.

missinglink avatar May 13 '19 12:05 missinglink

Added FST in https://github.com/pelias/parser/pull/17

missinglink avatar May 13 '19 14:05 missinglink

I tried your branch, it use much more memory (415MB) and seems to be slower than Set (4 times slower than Set) :thinking:

Here is the test : https://gist.github.com/Joxit/32ccd5b5f63b474f30e707d804cbda25

Maybe in the future if we will want to add some metadata e.g if the token is Rue then it may be French. If it's Boulevard it may be English, French, Spanish....

Joxit avatar May 13 '19 14:05 Joxit