Avoid explicit calls to select.select()
Use higher level selectors module instead:
https://docs.python.org/3/library/selectors.html.
Selectors uses the most efficient implementation available on the current platform. On Linux, it defaults to using:
$ python3 -c "import selectors; print(selectors.DefaultSelector())"
<selectors.EpollSelector object at 0x7a6a66f02120>
Fixes: https://github.com/sylikc/pyexiftool/issues/97
~~TODO: Benchmarking. I don't foresee this being slower than select.select(), but rather safe than sorry.~~ Done here: https://github.com/sylikc/pyexiftool/pull/98#issuecomment-2374033078
Benchmarking
vagrant@almalinux:~/test$ cat benchmark.py
import exiftool
images = [ f"images/{i}" for i in range(512)]
with exiftool.ExifToolHelper() as et:
print(et.get_metadata(images))
for i in range(512):
print(exiftool.ExifToolHelper().get_metadata(f"images/{i}"))
Old:
vagrant@almalinux:~/test$ rpm -q python3-pyexiftool
python3-pyexiftool-0.5.6-1.el9.noarch
vagrant@almalinux:~/test$ python3 benchmark.py | md5sum
8cb4cc1e001e1abcfe6a4b4bae714288 -
vagrant@almalinux:~/test$ time python3 benchmark.py > /dev/null
real 0m48.851s
user 0m42.612s
sys 0m6.483s
New:
vagrant@almalinux:~/test$ rpm -q python3-pyexiftool
python3-pyexiftool-0.5.6-2.el9.noarch
vagrant@almalinux:~/test$ python3 benchmark.py | md5sum
8cb4cc1e001e1abcfe6a4b4bae714288 -
vagrant@almalinux:~/test$ time python3 benchmark.py > /dev/null
real 0m47.134s
user 0m41.186s
sys 0m6.156s
And just:
vagrant@almalinux:~/test$ cat benchmark.py
import exiftool
images = [ f"images/{i}" for i in range(512)]
with exiftool.ExifToolHelper() as et:
print(et.get_metadata(images))
# for i in range(512):
# print(exiftool.ExifToolHelper().get_metadata(f"images/{i}"))
To get some rounds in, old:
vagrant@almalinux:~/test$ perf stat -r 10 python3 benchmark.py > /dev/null
Performance counter stats for 'python3 benchmark.py' (10 runs):
3450.78 msec task-clock:u # 1.009 CPUs utilized ( +- 0.26% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
11170 page-faults:u # 3.237 K/sec ( +- 1.23% )
14374934023 cycles:u # 4.166 GHz ( +- 0.26% )
17189602 stalled-cycles-frontend:u # 0.12% frontend cycles idle ( +- 2.14% )
71707628 stalled-cycles-backend:u # 0.50% backend cycles idle ( +- 11.28% )
21913527379 instructions:u # 1.52 insn per cycle
# 0.00 stalled cycles per insn ( +- 0.04% )
4602999890 branches:u # 1.334 G/sec ( +- 0.04% )
0 branch-misses:u
3.41934 +- 0.00858 seconds time elapsed ( +- 0.25% )
New:
vagrant@almalinux:~/test$ perf stat -r 10 python3 benchmark.py > /dev/null
Performance counter stats for 'python3 benchmark.py' (10 runs):
3450.74 msec task-clock:u # 1.010 CPUs utilized ( +- 0.13% )
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
11305 page-faults:u # 3.276 K/sec ( +- 0.81% )
14332278239 cycles:u # 4.153 GHz ( +- 0.11% )
16894272 stalled-cycles-frontend:u # 0.12% frontend cycles idle ( +- 1.35% )
72376828 stalled-cycles-backend:u # 0.50% backend cycles idle ( +- 15.40% )
21919017057 instructions:u # 1.53 insn per cycle
# 0.00 stalled cycles per insn ( +- 0.03% )
4604247475 branches:u # 1.334 G/sec ( +- 0.04% )
0 branch-misses:u
3.41807 +- 0.00510 seconds time elapsed ( +- 0.15% )
I'm seeing next to no difference between the two solutions on my alma9 VM. Testing larger images could be useful, but all our test images are small not to bloat the git repo. These 512 images are all: https://github.com/Digital-Preservation-Finland/file-scraper/blob/master/tests/data/image_jpeg/valid_2.2.1_exif_metadata.jpg.