timezonefinder icon indicating copy to clipboard operation
timezonefinder copied to clipboard

timezonefinder become significantly slower in Docker

Open rez0n opened this issue 2 years ago • 30 comments

Hi, I want just to report, maybe it will be helpful to someone, all versions after 5.2.0 may be super slow working in Docker.

In my case, upgrade from 5.2.0 (Numba 0.54.0) to 6.0.2 (Numba 0.56.0) caused processing of ~400 timezone findings from ~1.5 seconds to ~40 and just in the Docker container (I tried to use different base images 3.9/3.10/slim/bullseye), the issue is gone just after downgrading to [email protected] (Numba 0.55.0) Meanwhile, it works fast as before on any version locally without Docker.

You can find related Numba issue, but in my case benchmark returned good result for Numba 0.56.0 inside Docker, so it may mean Numba works at least.

rez0n avatar Aug 09 '22 14:08 rez0n

Thanks for reporting this. It might be due to an issue with the library h3, which is a dependency introduced in v6.0.0

Could you double check and report if not installing Numba into the docker container does increase the runtime even further?

jannikmi avatar Aug 09 '22 14:08 jannikmi

@jannikmi Just to confirm, you ask to check the performance of the timezonefinder 6.0.2 without installed numba, correct?

rez0n avatar Aug 09 '22 15:08 rez0n

Yes, exactly. If the time increases even further considerably then it is likely that Numba is not the cause of the delay you are reporting.

jannikmi avatar Aug 09 '22 15:08 jannikmi

Please, my test results, to be clear I do not see much difference when numba is installed or not, it is looks like numba not used at all, is it possible?

Base image python:3.9-slim Timezonefinder 6.0.2 (without numba) m:s:ms 0:56:18 0:55:77 0:54:56 0:49:60

Timezonefinder 6.0.2 (with numba 0.56.0) m:s:ms 0:47:08 0:56:19 0:55:34 0:58:33

Base image python:3.9 Timezonefinder 6.0.2 (without numba) m:s:ms 52:99 1:0:90 54:28 54:24

Timezonefinder 6.0.2 (with numba 0.56.0) m:s:ms 0:53:90 0:53:80 0:53:78 0:50:37

Base image python:3.10 Timezonefinder 6.0.2 (without numba) m:s:ms 0:52:65 0:45:86 0:46:78 0:43:90

Timezonefinder 6.0.2 (with numba 0.56.0) m:s:ms 0:45:43 0:43:08 0:40:75 0:41:59

(all time during execution CPU load is 100%)

And for example python:3.9-slim, Timezonefinder 5.2.0 with numba 0.54.0 1304ms 959ms 1129ms 1059ms

rez0n avatar Aug 09 '22 16:08 rez0n

Yes, that's possible. It seems like Numba is actually not working properly meaning that using pure python code is as fast as the Numba equivalents.

I wonder if there is a way of installing the same Numba version 0.54.0 with Timezonefinder 6 to compare?

jannikmi avatar Aug 10 '22 02:08 jannikmi

Hi @jannikmi Do you have any hints to ensure is problem in numba or timezonefinder? Im ask because I’ve ran numba benchmark (posted in numba repo issue) and have good result which is means numba works

rez0n avatar Aug 10 '22 16:08 rez0n

From the results you posted I am pretty sure it is a Numba issue (since installing Numba does not seem to change the timing).

It is however strange when the general Numba benchmarks work. Have you double checked the Numba versions you test are the same?

Could you please print the Timezonefinder property 'f.using_numba()' when Numba is installed.

jannikmi avatar Aug 10 '22 17:08 jannikmi

Hi @jannikmi It returns True but the execution time is 1m4s.

rez0n avatar Aug 11 '22 11:08 rez0n

Ok, Stange. That is a long time for just a variable evaluation.

Can you time a second evaluation as well please?

jannikmi avatar Aug 11 '22 13:08 jannikmi

@jannikmi sorry for the confusion, I've added f.using_numba() to my dataset generation after each call, 1m4s it is time of overall execution of the ~400 calls.

rez0n avatar Aug 11 '22 13:08 rez0n

Ok. That makes more sense. Can you please double check that the general Numba benchmarks work for the exact same Numba version you are using with timezonefinder?

jannikmi avatar Aug 11 '22 13:08 jannikmi

And please also double check you are benchmarking the njit functionality of Numba since that is what timezonefinder is using: https://github.com/jannikmi/timezonefinder/blob/master/timezonefinder/utils.py

jannikmi avatar Aug 11 '22 13:08 jannikmi

@jannikmi Looks like this benchmark uses jit, not njit, link to benchmark: https://github.com/numba/numba/issues/8293#issue-1320087411 I have no experience with it to sort it out and modify it within a reasonable time. @esc sorry for disturbing you, but maybe you have some advice on how to ensure njit (used by Timezonefinder) works correctly?

rez0n avatar Aug 11 '22 13:08 rez0n

@numba.jit(nopython=True,...) used in the benchmarks should be equivalent to @numba.njit

I have the gut feeling that this has something todo with the initial time required by Numba to just in time compile the functions (which afterwards are being cached).

I guess due to the docker setup the cached JIT compiled functions will be forgotten each time, while in your local setup they will be reused (giving you the speed benefit).

Can you make sure you call timezonefinder for a couple of points (without timing it) before running your benchmarks please?

jannikmi avatar Aug 11 '22 13:08 jannikmi

An alternative perhaps preferable way of ensuring the same thing is being measured is deleting the cache every time you run the benchmark (also locally)

jannikmi avatar Aug 11 '22 14:08 jannikmi

Can you make sure you call timezonefinder for a couple of points (without timing it) before running your benchmarks please?

As I answered in the Numba repo issue - I running gradually 4 test attempts without reloading or so, so any cache should be in place.

Personally for me I have no any visible difference is cache present or not (after each Docker rebuild/deployment) it have no any chache but first request takes under 2 seconds to generate, next requests 1.0-1.5. It is about TF 5.2.0/numba0.54.0.

In case TF6.0.2/numba0.56.0 locally the same

rez0n avatar Aug 11 '22 14:08 rez0n

I am not an expert on this, but at least the caching behavior could be different in the Docker image. Perhaps it is so slow overall because caching is not being used at all. It's just strange that it all seems to work with timezonefinder 5.2.0 To be honest at this point I am a bit clueless.

Just to confirm: are you running the general Numba benchmarks with the exact same Numba version you are using for the timezonefinder benchmark?

jannikmi avatar Aug 11 '22 14:08 jannikmi

I am not an expert on this, but at least the caching behavior could be different in the Docker image.

No, usually it is same as locally, I have a lot of experience working with Docker, to be clear I'm sure it is not a Docker issue in general, but most probably official Docker image hasn't some required by latest Numba or Numba have some kind of bug which not allows Timezonefinder use it's features in such environment.

Just to confirm: are you running the general Numba benchmarks with the exact same Numba version you are using for the timezonefinder benchmark?

Yes, I ran the benchmark inside the same container (python 3.10/TF6.0.2/Numba 0.56.0) right after testing TF functions

rez0n avatar Aug 11 '22 19:08 rez0n

I don't have any ideas then where this is coming from.

FYI: Since Numba also caused other issues in the past and it is a huge dependency (llvmlite compiler 20+MB), I am working on a new release using a pure C point in polygon implementation. I found this nice package to generate C extensions more easily: https://cffi.readthedocs.io/en/latest/installation.html

Please hang tight until then please and use v5 of timezonefinder if speed matters to you

jannikmi avatar Aug 11 '22 23:08 jannikmi

I don't have any ideas then where this is coming from.

If you do end up finding a Numba bug while using Numba in docker we would appreciate an issue, of course. One idea I had was to maybe execute numba -s in the docker container. This will yield Numba diagnostic output, perhaps there are some clues in that?

esc avatar Aug 12 '22 07:08 esc

A couple of suggestions:

  1. The environment variable NUMBA_DEBUG_CACHE=1 can be set to help debug Numba's caching behaviour. When set it should show what is being stored and loaded, and from where.
  2. It might also be worthwhile making sure that numba is actually importing correctly as it appears that if it does not there's a silent fallback: https://github.com/jannikmi/timezonefinder/blob/c0b92d451793448235d9e23978aefc27cd08d3fa/timezonefinder/utils.py#L21-L27 Could this perhaps explain there being no difference in performance whether numba is installed or not, similarly whether something has apparently been compiled and cached or not. If the import fails, all the execution times would be like Numba wasn't installed?

stuartarchibald avatar Aug 12 '22 12:08 stuartarchibald

Hi @esc

root@cf4d571e7422:/app# numba -s
System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time)                   : 2022-08-12 14:19:46.133256
UTC start time                                : 2022-08-12 14:19:46.133260
Running time (s)                              : 0.854724

__Hardware Information__
Machine                                       : x86_64
CPU Name                                      : skylake
CPU Count                                     : 6
Number of accessible CPUs                     : 6
List of accessible CPUs cores                 : 0-5
CFS Restrictions (CPUs worth of runtime)      : None

CPU Features                                  : 64bit aes avx avx2 bmi bmi2 cmov
                                                cx16 cx8 f16c fma fsgsbase fxsr
                                                lzcnt mmx movbe pclmul popcnt
                                                prfchw rdrnd sahf sse sse2 sse3
                                                sse4.1 sse4.2 ssse3 xsave xsaveopt

Memory Total (MB)                             : 3933
Memory Available (MB)                         : 1625

__OS Information__
Platform Name                                 : Linux-5.10.104-linuxkit-x86_64-with-glibc2.31
Platform Release                              : 5.10.104-linuxkit
OS Name                                       : Linux
OS Version                                    : #1 SMP Thu Mar 17 17:08:06 UTC 2022
OS Specific Version                           : ?
Libc Version                                  : glibc 2.31

__Python Information__
Python Compiler                               : GCC 10.2.1 20210110
Python Implementation                         : CPython
Python Version                                : 3.9.12
Python Locale                                 : en_US.UTF-8

__Numba Toolchain Versions__
Numba Version                                 : 0.56.0
llvmlite Version                              : 0.39.0

__LLVM Information__
LLVM Version                                  : 11.1.0

__CUDA Information__
CUDA Device Initialized                       : False
CUDA Driver Version                           : ?
CUDA Runtime Version                          : ?
CUDA NVIDIA Bindings Available                : ?
CUDA NVIDIA Bindings In Use                   : ?
CUDA Detect Output:
None
CUDA Libraries Test Output:
None

__NumPy Information__
NumPy Version                                 : 1.22.4
NumPy Supported SIMD features                 : ('MMX', 'SSE', 'SSE2', 'SSE3', 'SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2')
NumPy Supported SIMD dispatch                 : ('SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2', 'AVX512F', 'AVX512CD', 'AVX512_KNL', 'AVX512_KNM', 'AVX512_SKX', 'AVX512_CLX', 'AVX512_CNL', 'AVX512_ICL')
NumPy Supported SIMD baseline                 : ('SSE', 'SSE2', 'SSE3')
NumPy AVX512_SKX support detected             : False

__SVML Information__
SVML State, config.USING_SVML                 : False
SVML Library Loaded                           : False
llvmlite Using SVML Patched LLVM              : True
SVML Operational                              : False

__Threading Layer Information__
TBB Threading Layer Available                 : False
+--> Disabled due to Unknown import problem.
OpenMP Threading Layer Available              : True
+-->Vendor: GNU
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
None found.

__Conda Information__
Conda not available.

__Installed Packages__
Package                      Version
---------------------------- ---------
amqp                         5.0.6
asgiref                      3.4.1
billiard                     3.6.4.0
boto3                        1.20.14
botocore                     1.23.14
Brotli                       1.0.9
CacheControl                 0.12.10
cachetools                   4.2.4
cachy                        0.3.0
celery                       5.2.1
certifi                      2021.10.8
cffi                         1.15.0
charset-normalizer           2.0.8
cleo                         0.8.1
click                        8.0.3
click-didyoumean             0.3.0
click-plugins                1.1.1
click-repl                   0.2.0
clikit                       0.6.2
crashtest                    0.3.1
cryptography                 37.0.4
cssselect2                   0.4.1
Deprecated                   1.2.13
distlib                      0.3.5
Django                       4.0.3
django-ajax-datatable        4.4.3
django-appconf               1.0.5
django-bootstrap-modal-forms 2.2.0
django-database-prefix       0.1.0
django-datatable-view        2.1.6
django-downloadview          2.3.0
django-filter                2.4.0
django-formtools             2.3
django-model-utils           4.2.0
django-notifications-hq      1.7.0
django-otp                   1.1.1
django-phonenumber-field     5.2.0
django-recaptcha             3.0.0
django-redis                 5.0.0
django-select2               7.10.0
django-storages              1.12.3
django-two-factor-auth       1.13.2
djangorestframework          3.12.4
djangorestframework-jsonapi  4.3.0
fcm-django                   1.0.5
filelock                     3.7.1
firebase-admin               5.1.0
fonttools                    4.34.4
gevent                       21.8.0
google-api-core              2.2.2
google-api-python-client     2.31.0
google-auth                  2.3.3
google-auth-httplib2         0.1.0
google-cloud-core            2.2.1
google-cloud-firestore       2.3.4
google-cloud-storage         1.43.0
google-crc32c                1.3.0
google-resumable-media       2.1.0
googleapis-common-protos     1.53.0
greenlet                     1.1.2
grpcio                       1.42.0
grpcio-status                1.42.0
gunicorn                     20.1.0
h3                           3.7.4
html5lib                     1.1
httplib2                     0.20.2
idna                         3.3
importlib-metadata           4.12.0
inflection                   0.5.1
install                      1.3.4
jeepney                      0.8.0
jmespath                     0.10.0
jsonfield                    3.1.0
keyring                      23.8.2
kombu                        5.2.2
llvmlite                     0.39.0
lockfile                     0.12.2
msgpack                      1.0.3
mysqlclient                  2.1.0
numba                        0.56.0
numpy                        1.22.4
packaging                    21.3
pastel                       0.2.1
phonenumberslite             8.12.38
Pillow                       8.4.0
pip                          22.0.4
pkginfo                      1.8.3
platformdirs                 2.5.2
poetry                       1.1.14
poetry-core                  1.0.8
prompt-toolkit               3.0.23
proto-plus                   1.19.8
protobuf                     3.19.1
psycopg2                     2.9.3
pyasn1                       0.4.8
pyasn1-modules               0.2.8
pycparser                    2.21
pydyf                        0.1.2
pylev                        1.4.0
pyparsing                    3.0.6
pyphen                       0.11.0
python-dateutil              2.8.2
python-decouple              3.5
pytz                         2021.3
PyYAML                       5.4.1
qrcode                       6.1
redis                        4.0.2
requests                     2.26.0
requests-toolbelt            0.9.1
rsa                          4.8
s3transfer                   0.5.0
SecretStorage                3.3.2
sentry-sdk                   1.5.12
setuptools                   58.1.0
shellingham                  1.5.0
six                          1.16.0
sqlparse                     0.4.2
swapper                      1.2.0
timezonefinder               6.0.2
tinycss2                     1.1.1
tomlkit                      0.11.2
uritemplate                  4.1.1
urllib3                      1.26.7
vine                         5.0.0
virtualenv                   20.16.3
wcwidth                      0.2.5
weasyprint                   53.4
webencodings                 0.5.1
wheel                        0.37.1
wrapt                        1.13.3
zipp                         3.8.1
zope.event                   4.5.0
zope.interface               5.4.0
zopfli                       0.1.9

No errors reported.


__Warning log__
Warning (cuda): CUDA driver library cannot be found or no CUDA enabled devices are present.
Exception class: <class 'numba.cuda.cudadrv.error.CudaSupportError'>
Warning: Conda not available.
 Error was [Errno 2] No such file or directory: 'conda'

Warning (psutil): psutil cannot be imported. For more accuracy, consider installing it.
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_quota_us
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_period_us
--------------------------------------------------------------------------------
If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as
appropriate.

=============================================================
IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.
=============================================================

Hi @stuartarchibald Thanks for your message.

  1. Adding NUMBA_DEBUG_CACHE=1 does not affect anything, not any debug output appeared
  2. Good point, I made a basic test and it works correctly in my case.
>>> from numba import b1, f8, i2, i4, njit, typeof, u2
>>> exit()

rez0n avatar Aug 12 '22 14:08 rez0n

@esc @jannikmi I have "good" news, I reproduced the issue not only in Docker but locally too. Yesterday I have re-created venv (for another reason) and now I have similar behavior as in Docker. Would it help if I will share my poetry.lock to someone of you?

rez0n avatar Aug 12 '22 14:08 rez0n

@rez0n thanks. can you reproduce the Numba slowdown locally only for timezonefinder or Numba in general?

jannikmi avatar Aug 12 '22 15:08 jannikmi

@jannikmi I not sure, Timezonefinder 6.0.2 slow, Numba benchmark returns 0.34144562499999953

rez0n avatar Aug 12 '22 15:08 rez0n

1. Adding `NUMBA_DEBUG_CACHE=1` does not affect anything, not any debug output appeared

2. Good point, I made a basic test and it works correctly in my case.
>>> from numba import b1, f8, i2, i4, njit, typeof, u2
>>> exit()

Thanks for this @rez0n. NUMBA_DEBUG_CACHE=1 produces output on stdout, assuming you are capturing that somewhere, if it is indeed empty it implies nothing was cached or read from cache, which seems a little strange given the code involved. There's also environment variable NUMBA_DEBUG=1 which should produce masses of output when Numba attempts to compile something (even when loading from cache you'll get some output, including a UserWarning: Inspection disabled for cached code. Invalid result is returned.).

stuartarchibald avatar Aug 12 '22 15:08 stuartarchibald

@esc @jannikmi I have "good" news, I reproduced the issue not only in Docker but locally too. Yesterday I have re-created venv (for another reason) and now I have similar behavior as in Docker. Would it help if I will share my poetry.lock to someone of you?

The best would be some instructions for us to reproduce. That would help a lot. Otherwise any more clues you can share are appreciated.

esc avatar Aug 12 '22 17:08 esc

@rez0n maybe you can share a docker file even?

esc avatar Aug 12 '22 17:08 esc

@esc I will prepare a demo project and share it

rez0n avatar Aug 12 '22 17:08 rez0n

@esc Please, find out demo project. Instructions to quick reproduce added to readme, on index page you see example output and timing. https://github.com/rez0n/django-timezonefinder-numba-demo

Update: I also added tf5_numba54 branch with proof that Timezonefinder 5.2.0 with Numba 0.54.0 generates same page less than second.

rez0n avatar Aug 12 '22 19:08 rez0n