timezonefinder
timezonefinder copied to clipboard
timezonefinder become significantly slower in Docker
Hi, I want just to report, maybe it will be helpful to someone, all versions after 5.2.0 may be super slow working in Docker.
In my case, upgrade from 5.2.0 (Numba 0.54.0) to 6.0.2 (Numba 0.56.0) caused processing of ~400 timezone findings from ~1.5 seconds to ~40 and just in the Docker container (I tried to use different base images 3.9/3.10/slim/bullseye), the issue is gone just after downgrading to [email protected] (Numba 0.55.0) Meanwhile, it works fast as before on any version locally without Docker.
You can find related Numba issue, but in my case benchmark returned good result for Numba 0.56.0 inside Docker, so it may mean Numba works at least.
Thanks for reporting this. It might be due to an issue with the library h3, which is a dependency introduced in v6.0.0
Could you double check and report if not installing Numba into the docker container does increase the runtime even further?
@jannikmi Just to confirm, you ask to check the performance of the timezonefinder 6.0.2 without installed numba, correct?
Yes, exactly. If the time increases even further considerably then it is likely that Numba is not the cause of the delay you are reporting.
Please, my test results, to be clear I do not see much difference when numba is installed or not, it is looks like numba not used at all, is it possible?
Base image python:3.9-slim Timezonefinder 6.0.2 (without numba) m:s:ms 0:56:18 0:55:77 0:54:56 0:49:60
Timezonefinder 6.0.2 (with numba 0.56.0) m:s:ms 0:47:08 0:56:19 0:55:34 0:58:33
Base image python:3.9 Timezonefinder 6.0.2 (without numba) m:s:ms 52:99 1:0:90 54:28 54:24
Timezonefinder 6.0.2 (with numba 0.56.0) m:s:ms 0:53:90 0:53:80 0:53:78 0:50:37
Base image python:3.10 Timezonefinder 6.0.2 (without numba) m:s:ms 0:52:65 0:45:86 0:46:78 0:43:90
Timezonefinder 6.0.2 (with numba 0.56.0) m:s:ms 0:45:43 0:43:08 0:40:75 0:41:59
(all time during execution CPU load is 100%)
And for example python:3.9-slim, Timezonefinder 5.2.0 with numba 0.54.0 1304ms 959ms 1129ms 1059ms
Yes, that's possible. It seems like Numba is actually not working properly meaning that using pure python code is as fast as the Numba equivalents.
I wonder if there is a way of installing the same Numba version 0.54.0 with Timezonefinder 6 to compare?
Hi @jannikmi Do you have any hints to ensure is problem in numba or timezonefinder? Im ask because I’ve ran numba benchmark (posted in numba repo issue) and have good result which is means numba works
From the results you posted I am pretty sure it is a Numba issue (since installing Numba does not seem to change the timing).
It is however strange when the general Numba benchmarks work. Have you double checked the Numba versions you test are the same?
Could you please print the Timezonefinder property 'f.using_numba()' when Numba is installed.
Hi @jannikmi
It returns True
but the execution time is 1m4s.
Ok, Stange. That is a long time for just a variable evaluation.
Can you time a second evaluation as well please?
@jannikmi sorry for the confusion, I've added f.using_numba() to my dataset generation after each call, 1m4s it is time of overall execution of the ~400 calls.
Ok. That makes more sense. Can you please double check that the general Numba benchmarks work for the exact same Numba version you are using with timezonefinder?
And please also double check you are benchmarking the njit
functionality of Numba since that is what timezonefinder
is using: https://github.com/jannikmi/timezonefinder/blob/master/timezonefinder/utils.py
@jannikmi
Looks like this benchmark uses jit
, not njit
, link to benchmark: https://github.com/numba/numba/issues/8293#issue-1320087411
I have no experience with it to sort it out and modify it within a reasonable time.
@esc sorry for disturbing you, but maybe you have some advice on how to ensure njit
(used by Timezonefinder) works correctly?
@numba.jit(nopython=True,...)
used in the benchmarks should be equivalent to @numba.njit
I have the gut feeling that this has something todo with the initial time required by Numba to just in time compile the functions (which afterwards are being cached).
I guess due to the docker setup the cached JIT compiled functions will be forgotten each time, while in your local setup they will be reused (giving you the speed benefit).
Can you make sure you call timezonefinder for a couple of points (without timing it) before running your benchmarks please?
An alternative perhaps preferable way of ensuring the same thing is being measured is deleting the cache every time you run the benchmark (also locally)
Can you make sure you call timezonefinder for a couple of points (without timing it) before running your benchmarks please?
As I answered in the Numba repo issue - I running gradually 4 test attempts without reloading or so, so any cache should be in place.
Personally for me I have no any visible difference is cache present or not (after each Docker rebuild/deployment) it have no any chache but first request takes under 2 seconds to generate, next requests 1.0-1.5. It is about TF 5.2.0/numba0.54.0.
In case TF6.0.2/numba0.56.0 locally the same
I am not an expert on this, but at least the caching behavior could be different in the Docker image. Perhaps it is so slow overall because caching is not being used at all. It's just strange that it all seems to work with timezonefinder 5.2.0 To be honest at this point I am a bit clueless.
Just to confirm: are you running the general Numba benchmarks with the exact same Numba version you are using for the timezonefinder benchmark?
I am not an expert on this, but at least the caching behavior could be different in the Docker image.
No, usually it is same as locally, I have a lot of experience working with Docker, to be clear I'm sure it is not a Docker issue in general, but most probably official Docker image hasn't some required by latest Numba or Numba have some kind of bug which not allows Timezonefinder use it's features in such environment.
Just to confirm: are you running the general Numba benchmarks with the exact same Numba version you are using for the timezonefinder benchmark?
Yes, I ran the benchmark inside the same container (python 3.10/TF6.0.2/Numba 0.56.0) right after testing TF functions
I don't have any ideas then where this is coming from.
FYI: Since Numba
also caused other issues in the past and it is a huge dependency (llvmlite compiler 20+MB), I am working on a new release using a pure C point in polygon implementation.
I found this nice package to generate C extensions more easily: https://cffi.readthedocs.io/en/latest/installation.html
Please hang tight until then please and use v5 of timezonefinder if speed matters to you
I don't have any ideas then where this is coming from.
If you do end up finding a Numba bug while using Numba in docker we would appreciate an issue, of course. One idea I had was to maybe execute numba -s
in the docker container. This will yield Numba diagnostic output, perhaps there are some clues in that?
A couple of suggestions:
- The environment variable
NUMBA_DEBUG_CACHE=1
can be set to help debug Numba's caching behaviour. When set it should show what is being stored and loaded, and from where. - It might also be worthwhile making sure that
numba
is actually importing correctly as it appears that if it does not there's a silent fallback: https://github.com/jannikmi/timezonefinder/blob/c0b92d451793448235d9e23978aefc27cd08d3fa/timezonefinder/utils.py#L21-L27 Could this perhaps explain there being no difference in performance whethernumba
is installed or not, similarly whether something has apparently been compiled and cached or not. If the import fails, all the execution times would be like Numba wasn't installed?
Hi @esc
root@cf4d571e7422:/app# numba -s
System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time) : 2022-08-12 14:19:46.133256
UTC start time : 2022-08-12 14:19:46.133260
Running time (s) : 0.854724
__Hardware Information__
Machine : x86_64
CPU Name : skylake
CPU Count : 6
Number of accessible CPUs : 6
List of accessible CPUs cores : 0-5
CFS Restrictions (CPUs worth of runtime) : None
CPU Features : 64bit aes avx avx2 bmi bmi2 cmov
cx16 cx8 f16c fma fsgsbase fxsr
lzcnt mmx movbe pclmul popcnt
prfchw rdrnd sahf sse sse2 sse3
sse4.1 sse4.2 ssse3 xsave xsaveopt
Memory Total (MB) : 3933
Memory Available (MB) : 1625
__OS Information__
Platform Name : Linux-5.10.104-linuxkit-x86_64-with-glibc2.31
Platform Release : 5.10.104-linuxkit
OS Name : Linux
OS Version : #1 SMP Thu Mar 17 17:08:06 UTC 2022
OS Specific Version : ?
Libc Version : glibc 2.31
__Python Information__
Python Compiler : GCC 10.2.1 20210110
Python Implementation : CPython
Python Version : 3.9.12
Python Locale : en_US.UTF-8
__Numba Toolchain Versions__
Numba Version : 0.56.0
llvmlite Version : 0.39.0
__LLVM Information__
LLVM Version : 11.1.0
__CUDA Information__
CUDA Device Initialized : False
CUDA Driver Version : ?
CUDA Runtime Version : ?
CUDA NVIDIA Bindings Available : ?
CUDA NVIDIA Bindings In Use : ?
CUDA Detect Output:
None
CUDA Libraries Test Output:
None
__NumPy Information__
NumPy Version : 1.22.4
NumPy Supported SIMD features : ('MMX', 'SSE', 'SSE2', 'SSE3', 'SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2')
NumPy Supported SIMD dispatch : ('SSSE3', 'SSE41', 'POPCNT', 'SSE42', 'AVX', 'F16C', 'FMA3', 'AVX2', 'AVX512F', 'AVX512CD', 'AVX512_KNL', 'AVX512_KNM', 'AVX512_SKX', 'AVX512_CLX', 'AVX512_CNL', 'AVX512_ICL')
NumPy Supported SIMD baseline : ('SSE', 'SSE2', 'SSE3')
NumPy AVX512_SKX support detected : False
__SVML Information__
SVML State, config.USING_SVML : False
SVML Library Loaded : False
llvmlite Using SVML Patched LLVM : True
SVML Operational : False
__Threading Layer Information__
TBB Threading Layer Available : False
+--> Disabled due to Unknown import problem.
OpenMP Threading Layer Available : True
+-->Vendor: GNU
Workqueue Threading Layer Available : True
+-->Workqueue imported successfully.
__Numba Environment Variable Information__
None found.
__Conda Information__
Conda not available.
__Installed Packages__
Package Version
---------------------------- ---------
amqp 5.0.6
asgiref 3.4.1
billiard 3.6.4.0
boto3 1.20.14
botocore 1.23.14
Brotli 1.0.9
CacheControl 0.12.10
cachetools 4.2.4
cachy 0.3.0
celery 5.2.1
certifi 2021.10.8
cffi 1.15.0
charset-normalizer 2.0.8
cleo 0.8.1
click 8.0.3
click-didyoumean 0.3.0
click-plugins 1.1.1
click-repl 0.2.0
clikit 0.6.2
crashtest 0.3.1
cryptography 37.0.4
cssselect2 0.4.1
Deprecated 1.2.13
distlib 0.3.5
Django 4.0.3
django-ajax-datatable 4.4.3
django-appconf 1.0.5
django-bootstrap-modal-forms 2.2.0
django-database-prefix 0.1.0
django-datatable-view 2.1.6
django-downloadview 2.3.0
django-filter 2.4.0
django-formtools 2.3
django-model-utils 4.2.0
django-notifications-hq 1.7.0
django-otp 1.1.1
django-phonenumber-field 5.2.0
django-recaptcha 3.0.0
django-redis 5.0.0
django-select2 7.10.0
django-storages 1.12.3
django-two-factor-auth 1.13.2
djangorestframework 3.12.4
djangorestframework-jsonapi 4.3.0
fcm-django 1.0.5
filelock 3.7.1
firebase-admin 5.1.0
fonttools 4.34.4
gevent 21.8.0
google-api-core 2.2.2
google-api-python-client 2.31.0
google-auth 2.3.3
google-auth-httplib2 0.1.0
google-cloud-core 2.2.1
google-cloud-firestore 2.3.4
google-cloud-storage 1.43.0
google-crc32c 1.3.0
google-resumable-media 2.1.0
googleapis-common-protos 1.53.0
greenlet 1.1.2
grpcio 1.42.0
grpcio-status 1.42.0
gunicorn 20.1.0
h3 3.7.4
html5lib 1.1
httplib2 0.20.2
idna 3.3
importlib-metadata 4.12.0
inflection 0.5.1
install 1.3.4
jeepney 0.8.0
jmespath 0.10.0
jsonfield 3.1.0
keyring 23.8.2
kombu 5.2.2
llvmlite 0.39.0
lockfile 0.12.2
msgpack 1.0.3
mysqlclient 2.1.0
numba 0.56.0
numpy 1.22.4
packaging 21.3
pastel 0.2.1
phonenumberslite 8.12.38
Pillow 8.4.0
pip 22.0.4
pkginfo 1.8.3
platformdirs 2.5.2
poetry 1.1.14
poetry-core 1.0.8
prompt-toolkit 3.0.23
proto-plus 1.19.8
protobuf 3.19.1
psycopg2 2.9.3
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.21
pydyf 0.1.2
pylev 1.4.0
pyparsing 3.0.6
pyphen 0.11.0
python-dateutil 2.8.2
python-decouple 3.5
pytz 2021.3
PyYAML 5.4.1
qrcode 6.1
redis 4.0.2
requests 2.26.0
requests-toolbelt 0.9.1
rsa 4.8
s3transfer 0.5.0
SecretStorage 3.3.2
sentry-sdk 1.5.12
setuptools 58.1.0
shellingham 1.5.0
six 1.16.0
sqlparse 0.4.2
swapper 1.2.0
timezonefinder 6.0.2
tinycss2 1.1.1
tomlkit 0.11.2
uritemplate 4.1.1
urllib3 1.26.7
vine 5.0.0
virtualenv 20.16.3
wcwidth 0.2.5
weasyprint 53.4
webencodings 0.5.1
wheel 0.37.1
wrapt 1.13.3
zipp 3.8.1
zope.event 4.5.0
zope.interface 5.4.0
zopfli 0.1.9
No errors reported.
__Warning log__
Warning (cuda): CUDA driver library cannot be found or no CUDA enabled devices are present.
Exception class: <class 'numba.cuda.cudadrv.error.CudaSupportError'>
Warning: Conda not available.
Error was [Errno 2] No such file or directory: 'conda'
Warning (psutil): psutil cannot be imported. For more accuracy, consider installing it.
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_quota_us
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_period_us
--------------------------------------------------------------------------------
If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as
appropriate.
=============================================================
IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.
=============================================================
Hi @stuartarchibald Thanks for your message.
- Adding
NUMBA_DEBUG_CACHE=1
does not affect anything, not any debug output appeared - Good point, I made a basic test and it works correctly in my case.
>>> from numba import b1, f8, i2, i4, njit, typeof, u2
>>> exit()
@esc @jannikmi I have "good" news, I reproduced the issue not only in Docker but locally too. Yesterday I have re-created venv (for another reason) and now I have similar behavior as in Docker. Would it help if I will share my poetry.lock to someone of you?
@rez0n thanks. can you reproduce the Numba slowdown locally only for timezonefinder or Numba in general?
@jannikmi
I not sure, Timezonefinder 6.0.2 slow, Numba benchmark returns 0.34144562499999953
1. Adding `NUMBA_DEBUG_CACHE=1` does not affect anything, not any debug output appeared 2. Good point, I made a basic test and it works correctly in my case.
>>> from numba import b1, f8, i2, i4, njit, typeof, u2 >>> exit()
Thanks for this @rez0n. NUMBA_DEBUG_CACHE=1
produces output on stdout
, assuming you are capturing that somewhere, if it is indeed empty it implies nothing was cached or read from cache, which seems a little strange given the code involved. There's also environment variable NUMBA_DEBUG=1
which should produce masses of output when Numba attempts to compile something (even when loading from cache you'll get some output, including a UserWarning: Inspection disabled for cached code. Invalid result is returned.
).
@esc @jannikmi I have "good" news, I reproduced the issue not only in Docker but locally too. Yesterday I have re-created venv (for another reason) and now I have similar behavior as in Docker. Would it help if I will share my poetry.lock to someone of you?
The best would be some instructions for us to reproduce. That would help a lot. Otherwise any more clues you can share are appreciated.
@rez0n maybe you can share a docker file even?
@esc I will prepare a demo project and share it
@esc Please, find out demo project. Instructions to quick reproduce added to readme, on index page you see example output and timing. https://github.com/rez0n/django-timezonefinder-numba-demo
Update: I also added tf5_numba54 branch with proof that Timezonefinder 5.2.0 with Numba 0.54.0 generates same page less than second.