gluon-nlp icon indicating copy to clipboard operation
gluon-nlp copied to clipboard

[Latency] DefaultLookupDict too slow

Open hannw opened this issue 4 years ago • 4 comments

Description

The __getitem__ method of DefaultLookupDict is too slow. Profiling on a p3.16xlarge machine on AWS shows that each __getitem__ method costs 2.4 micro second, whereas a regular dictionary .get(key, defaultvalue) method on the same machine is around 120 nano second, so the current implementation is 20 times slower than the regular dict get operation. This operation along is taking about 50% of all processing time in our data pipeline. Is there a way to speed this up?

Error Message

N/A

To Reproduce

Use Vocab to numericalize strings as usual.

hannw avatar Aug 16 '20 00:08 hannw

@hannw Would you provide some profiling scripts that arrive at these numbers? In the new version, we use the default dictionary for storing the mapping: https://github.com/dmlc/gluon-nlp/blob/32e87d4d4aa20a6eb658ee90d765ccffbd160571/src/gluonnlp/data/vocab.py#L114. Also, we are doing a series of benchmarks to analyze the speed of GluonNLP: https://github.com/dmlc/gluon-nlp/tree/master/scripts/benchmarks. If you can provide some profiling scripts, it will be super helpful for us to speed up GluonNLP.

sxjscience avatar Aug 16 '20 00:08 sxjscience

@sxjscience for profiling, we use the default python cProfiler while running the training script and snakeviz to visualize the breakdown.

We are currently using 0.8.x of gluon-nlp, so maybe that's why the get is slow. Let us test the bleeding edge and see if it speed things up.

hannw avatar Aug 16 '20 01:08 hannw

@hannw thanks for the interest. note that the master branch is now used for numpy-compatible version of gluonnlp (#1298) which relies on mxnet 2.0 nightly builds (available to developers at https://dist.mxnet.io/python).

szha avatar Aug 16 '20 03:08 szha

@hannw I think the above comparison is not apple to apple in our supported use cases. The Vocab class is designed to handle both the cases on whether unknown token is set. If the unknown token is not set, then it should throw an error according to the definition of the class.

If you know beforehand that you always have unknown token, a good option may be to directly use the built-in dictionary instead of Vocab.

I did a comparison among the vocab class on 0.x, 0.8.3, and the one on 1.x and ~I don't see a significant difference, most likely due to the fact that there's a condition check in the new logic on whether unknown token is set~. I see mixed performance for two implementation in the cases of w/ and w/o unknown_token:

# Tests done on python3.7 OSX 10.15.6
# GluonNLP 0.x
# w/ unknown token
v = Vocab({k:1 for k in ['a', 'b', 'c']})
%timeit v['a']
530 ns ± 17.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit v['abc']
565 ns ± 24 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b']
%timeit v[keys]
3.32 µs ± 149 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b'] * 1000
%timeit v[keys]
2.67 ms ± 144 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# w/o unknown token
v = Vocab({k:1 for k in ['a', 'b', 'c']}, unknown_token=None)
%timeit v['a']
362 ns ± 16.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b']
%timeit v[keys]
1.17 µs ± 28.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b'] * 1000
%timeit v[keys]
582 µs ± 18.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# GluonNLP 0.8.3
# w/ unknown token
v = Vocab({k:1 for k in ['a', 'b', 'c']})
%timeit v['a']
530 ns ± 17.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit v['abc']
550 ns ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b']
%timeit v[keys]
3.2 µs ± 166 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b'] * 1000
%timeit v[keys]
2.65 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# w/o unknown token
v = Vocab({k:1 for k in ['a', 'b', 'c']}, unknown_token=None)
%timeit v['a']
362 ns ± 16.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b']
%timeit v[keys]
1.13 µs ± 14.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b'] * 1000
%timeit v[keys]
618 µs ± 51.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# GluonNLP master
# w/ unknown token
v = Vocab({k:1 for k in ['a', 'b', 'c']})
%timeit v['a']
598 ns ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit v['abc']
646 ns ± 34.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b']
%timeit v[keys]
2.34 µs ± 200 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b'] * 1000
%timeit v[keys]
1.34 ms ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# w/o unknown token
v = Vocab({k:1 for k in ['a', 'b', 'c']}, unknown_token=None)
%timeit v['a']
641 ns ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b']
%timeit v[keys]
2.37 µs ± 89.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
keys=['a', 'c', 'c', 'b', 'c', 'c', 'c', 'c', 'a', 'b'] * 1000
%timeit v[keys]
1.34 ms ± 27.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

szha avatar Sep 17 '20 05:09 szha