pyexiftool Avoid explicit calls to select.select()

Use higher level selectors module instead:

https://docs.python.org/3/library/selectors.html.

Selectors uses the most efficient implementation available on the current platform. On Linux, it defaults to using:

$ python3 -c "import selectors; print(selectors.DefaultSelector())"
<selectors.EpollSelector object at 0x7a6a66f02120>

Fixes: https://github.com/sylikc/pyexiftool/issues/97

Sep 24 '24 13:09 jukuisma

~~TODO: Benchmarking. I don't foresee this being slower than select.select(), but rather safe than sorry.~~ Done here: https://github.com/sylikc/pyexiftool/pull/98#issuecomment-2374033078

Sep 24 '24 13:09 jukuisma

Benchmarking

vagrant@almalinux:~/test$ cat benchmark.py 
import exiftool

images = [ f"images/{i}" for i in range(512)]
with exiftool.ExifToolHelper() as et:
    print(et.get_metadata(images))

for i in range(512):
    print(exiftool.ExifToolHelper().get_metadata(f"images/{i}"))

Old:

vagrant@almalinux:~/test$ rpm -q python3-pyexiftool
python3-pyexiftool-0.5.6-1.el9.noarch
vagrant@almalinux:~/test$ python3 benchmark.py | md5sum
8cb4cc1e001e1abcfe6a4b4bae714288  -
vagrant@almalinux:~/test$ time python3 benchmark.py > /dev/null

real    0m48.851s
user    0m42.612s
sys     0m6.483s

New:

vagrant@almalinux:~/test$ rpm -q python3-pyexiftool
python3-pyexiftool-0.5.6-2.el9.noarch
vagrant@almalinux:~/test$ python3 benchmark.py | md5sum
8cb4cc1e001e1abcfe6a4b4bae714288  -
vagrant@almalinux:~/test$ time python3 benchmark.py > /dev/null

real    0m47.134s
user    0m41.186s
sys     0m6.156s

And just:

vagrant@almalinux:~/test$ cat benchmark.py 
import exiftool

images = [ f"images/{i}" for i in range(512)]
with exiftool.ExifToolHelper() as et:
    print(et.get_metadata(images))

# for i in range(512):
#     print(exiftool.ExifToolHelper().get_metadata(f"images/{i}"))

To get some rounds in, old:

vagrant@almalinux:~/test$ perf stat -r 10 python3 benchmark.py > /dev/null                                                                                                                     
                                                                                                                                                                                               
 Performance counter stats for 'python3 benchmark.py' (10 runs):                                                                                                                               
                                                                                                                                                                                               
           3450.78 msec task-clock:u                     #    1.009 CPUs utilized               ( +-  0.26% )                                                                                  
                 0      context-switches:u               #    0.000 /sec                                                                                                                       
                 0      cpu-migrations:u                 #    0.000 /sec                                                                                                                       
             11170      page-faults:u                    #    3.237 K/sec                       ( +-  1.23% )                                                                                  
       14374934023      cycles:u                         #    4.166 GHz                         ( +-  0.26% )                                                                                  
          17189602      stalled-cycles-frontend:u        #    0.12% frontend cycles idle        ( +-  2.14% )                                                                                  
          71707628      stalled-cycles-backend:u         #    0.50% backend cycles idle         ( +- 11.28% )                                                                                  
       21913527379      instructions:u                   #    1.52  insn per cycle                                                                                                             
                                                  #    0.00  stalled cycles per insn     ( +-  0.04% )                                                                                         
        4602999890      branches:u                       #    1.334 G/sec                       ( +-  0.04% )                                                                                  
                 0      branch-misses:u                                                                                                                                                        
                                                                                                                                                                                               
           3.41934 +- 0.00858 seconds time elapsed  ( +-  0.25% )

New:

vagrant@almalinux:~/test$ perf stat -r 10 python3 benchmark.py > /dev/null 

 Performance counter stats for 'python3 benchmark.py' (10 runs):

           3450.74 msec task-clock:u                     #    1.010 CPUs utilized               ( +-  0.13% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             11305      page-faults:u                    #    3.276 K/sec                       ( +-  0.81% )
       14332278239      cycles:u                         #    4.153 GHz                         ( +-  0.11% )
          16894272      stalled-cycles-frontend:u        #    0.12% frontend cycles idle        ( +-  1.35% )
          72376828      stalled-cycles-backend:u         #    0.50% backend cycles idle         ( +- 15.40% )
       21919017057      instructions:u                   #    1.53  insn per cycle            
                                                  #    0.00  stalled cycles per insn     ( +-  0.03% )
        4604247475      branches:u                       #    1.334 G/sec                       ( +-  0.04% )
                 0      branch-misses:u                                                       

           3.41807 +- 0.00510 seconds time elapsed  ( +-  0.15% )

I'm seeing next to no difference between the two solutions on my alma9 VM. Testing larger images could be useful, but all our test images are small not to bloat the git repo. These 512 images are all: https://github.com/Digital-Preservation-Finland/file-scraper/blob/master/tests/data/image_jpeg/valid_2.2.1_exif_metadata.jpg.

Sep 25 '24 13:09 jukuisma