nbclient icon indicating copy to clipboard operation
nbclient copied to clipboard

A lot of threads and performance hits for executing a few notebooks simultaneously

Open Jacob-Stevens-Haas opened this issue 9 months ago • 2 comments

When I run a notebooks using nbclient.NotebookClient().execute(), each spawns a lot of threads (137), top/bpytop/htop shows that all cores are at 100% work, and ssh grows noticeably laggy when typing. This is on an AMD Ryzen Threadripper with 32 cores, running a maximum of eight simultaneous notebooks. None of the notebooks include any multiprocessing, threading, or async; it's a lot of work, but its all numpy, scipy, etc. I AM writing to a file, if that matters (the process adds a logging.FileHandler to the root logger and pickles some output to a file).

I don't necessarily know much about how to troubleshoot this kind of a problem, so I'm just going to start by sharing assumptions I have that people with more experience can correct. I thought that:

  • non-interactive python will only ever take up around 100% of CPU because it is single-threaded, and therefore can only execute on a single core.
  • if some python code would normally take up around 100% of CPU, running it as a notebook would only take up a small amount of overhead on other cores
  • The difference in overhead for executing via nbclient should be less than or equal to the overhead when running a notebook interactively, because nbclient is less (not at all) interactive.

I've seen a few jupyter-themed questions about threads, e.g., but none that I found use nbclient. I'm just trying to capture the text and image output of a block of code to a formatted html file, for which I'm using nbconvert.exporters.HTMLExporter and nbconvert.writers.files.FilesWriter. If there's a different way to get that result using the jupyter ecosystem, please let me know.

Jacob-Stevens-Haas avatar Mar 27 '25 05:03 Jacob-Stevens-Haas

Each kernel uses several threads because of zmq sockets, but I don't know if that could explain what you are seeing.

davidbrochart avatar Mar 27 '25 07:03 davidbrochart

Here's the output of pstree -pacts 3211769 (the process I originally started). nbclient is launched through a python entry point/console script mitosis, which checks that the repository is on a clean commit, then builds, runs, and converts to HTML a notebook based on the arguments.

It looks like NBClient launches a kernel, and ZMQ accounts for four threads of it. And the four more ZMQ threads in the original process are, perhaps, a server?

pstree -pacts 3211769
systemd,1 --system --deserialize 32 splash
  └─mitosis,3211769,jmsh /home/jmsh/github/Kalman-SINDy-paper/env/bin/mitosis
      ├─git,3212420 cat-file --batch-check
      ├─python3,3212437 -m ipykernel_launcher -f /tmp/tmpb8q4zxd3.json--Hist
      │   ├─{ZMQbg/IO/0},3212453
      │   ├─{ZMQbg/IO/0},3212457
      │   ├─{ZMQbg/Reaper},3212452
      │   ├─{ZMQbg/Reaper},3212456
      │   ├─{python3},3212454
      │   ├─{python3},3212455
      │   ├─{python3},3212458
      │   ├─{python3},3212459
      │   ├─{python3},3212460
      │   ├─{python3},3212468
      │   ├─{python3},3212497
      │   ├─{python3},3212498
      │   ├─{python3},3212499
      │   ├─{python3},3212500
      │   ├─{python3},3212501
      │   ├─{python3},3212502
      │   ├─{python3},3212503
      │   ├─{python3},3212504
      │   ├─{python3},3212505
      │   ├─{python3},3212506
      │   ├─{python3},3212507
      │   ├─{python3},3212508
      │   ├─{python3},3212509
      │   ├─{python3},3212510
      │   ├─{python3},3212511
      │   ├─{python3},3212512
      │   ├─{python3},3212513
      │   ├─{python3},3212514
      │   ├─{python3},3212515
      │   ├─{python3},3212516
      │   ├─{python3},3212517
      │   ├─{python3},3212518
      │   ├─{python3},3212519
      │   ├─{python3},3212520
      │   ├─{python3},3212521
      │   ├─{python3},3212522
      │   ├─{python3},3212523
      │   ├─{python3},3212524
      │   ├─{python3},3212525
      │   ├─{python3},3212526
      │   ├─{python3},3212527
      │   ├─{python3},3212528
      │   ├─{python3},3212529
      │   ├─{python3},3212530
      │   ├─{python3},3212531
      │   ├─{python3},3212532
      │   ├─{python3},3212533
      │   ├─{python3},3212534
      │   ├─{python3},3212535
      │   ├─{python3},3212536
      │   ├─{python3},3212537
      │   ├─{python3},3212538
      │   ├─{python3},3212539
      │   ├─{python3},3212540
      │   ├─{python3},3212541
      │   ├─{python3},3212542
      │   ├─{python3},3212543
      │   ├─{python3},3212544
      │   ├─{python3},3212545
      │   ├─{python3},3212546
      │   ├─{python3},3212547
      │   ├─{python3},3212548
      │   ├─{python3},3212549
      │   ├─{python3},3212550
      │   ├─{python3},3212551
      │   ├─{python3},3212552
      │   ├─{python3},3212553
      │   ├─{python3},3212554
      │   ├─{python3},3212555
      │   ├─{python3},3212556
      │   ├─{python3},3212557
      │   ├─{python3},3212558
      │   ├─{python3},3212559
      │   ├─{python3},3212836
      │   ├─{python3},3212837
      │   ├─{python3},3212838
      │   ├─{python3},3212839
      │   ├─{python3},3212840
      │   ├─{python3},3212841
      │   ├─{python3},3212842
      │   ├─{python3},3212843
      │   ├─{python3},3212844
      │   ├─{python3},3212845
      │   ├─{python3},3212846
      │   ├─{python3},3212847
      │   ├─{python3},3212848
      │   ├─{python3},3212849
      │   ├─{python3},3212850
      │   ├─{python3},3212851
      │   ├─{python3},3212852
      │   ├─{python3},3212853
      │   ├─{python3},3212854
      │   ├─{python3},3212855
      │   ├─{python3},3212856
      │   ├─{python3},3212857
      │   ├─{python3},3212858
      │   ├─{python3},3212859
      │   ├─{python3},3212860
      │   ├─{python3},3212861
      │   ├─{python3},3212862
      │   ├─{python3},3212863
      │   ├─{python3},3212864
      │   ├─{python3},3212865
      │   ├─{python3},3212866
      │   ├─{python3},3212867
      │   ├─{python3},3212868
      │   ├─{python3},3212869
      │   ├─{python3},3212870
      │   ├─{python3},3212871
      │   ├─{python3},3212872
      │   ├─{python3},3212873
      │   ├─{python3},3212874
      │   ├─{python3},3212875
      │   ├─{python3},3212876
      │   ├─{python3},3212877
      │   ├─{python3},3212878
      │   ├─{python3},3212879
      │   ├─{python3},3212880
      │   ├─{python3},3212881
      │   ├─{python3},3212882
      │   ├─{python3},3212883
      │   ├─{python3},3212884
      │   ├─{python3},3212885
      │   ├─{python3},3212886
      │   ├─{python3},3212887
      │   ├─{python3},3212888
      │   ├─{python3},3212889
      │   ├─{python3},3212890
      │   ├─{python3},3212891
      │   ├─{python3},3212892
      │   ├─{python3},3212893
      │   ├─{python3},3212894
      │   ├─{python3},3212895
      │   ├─{python3},3212896
      │   ├─{python3},3212897
      │   └─{python3},3212898
      ├─{ZMQbg/IO/0},3212439
      ├─{ZMQbg/IO/0},3212442
      ├─{ZMQbg/Reaper},3212438
      ├─{ZMQbg/Reaper},3212441
      ├─{mitosis},3211774
      ├─{mitosis},3211777
      ├─{mitosis},3211778
      ├─{mitosis},3211781
      ├─{mitosis},3211782
      ├─{mitosis},3211785
      ├─{mitosis},3211787
      ├─{mitosis},3211789
      ├─{mitosis},3211790
      ├─{mitosis},3211793
      ├─{mitosis},3211795
      ├─{mitosis},3211797
      ├─{mitosis},3211799
      ├─{mitosis},3211800
      ├─{mitosis},3211802
      ├─{mitosis},3211804
      ├─{mitosis},3211806
      ├─{mitosis},3211808
      ├─{mitosis},3211810
      ├─{mitosis},3211811
      ├─{mitosis},3211813
      ├─{mitosis},3211815
      ├─{mitosis},3211817
      ├─{mitosis},3211819
      ├─{mitosis},3211822
      ├─{mitosis},3211825
      ├─{mitosis},3211827
      ├─{mitosis},3211830
      ├─{mitosis},3211833
      ├─{mitosis},3211837
      ├─{mitosis},3211840
      ├─{mitosis},3211842
      ├─{mitosis},3211845
      ├─{mitosis},3211851
      ├─{mitosis},3211852
      ├─{mitosis},3211854
      ├─{mitosis},3211860
      ├─{mitosis},3211861
      ├─{mitosis},3211863
      ├─{mitosis},3211865
      ├─{mitosis},3211870
      ├─{mitosis},3211872
      ├─{mitosis},3211875
      ├─{mitosis},3211880
      ├─{mitosis},3211881
      ├─{mitosis},3211883
      ├─{mitosis},3211886
      ├─{mitosis},3211889
      ├─{mitosis},3211892
      ├─{mitosis},3211894
      ├─{mitosis},3211897
      ├─{mitosis},3211898
      ├─{mitosis},3211902
      ├─{mitosis},3211905
      ├─{mitosis},3211907
      ├─{mitosis},3211909
      ├─{mitosis},3211913
      ├─{mitosis},3211915
      ├─{mitosis},3211916
      ├─{mitosis},3211918
      ├─{mitosis},3211919
      ├─{mitosis},3211960
      ├─{mitosis},3211965
      ├─{mitosis},3212101
      ├─{mitosis},3212102
      ├─{mitosis},3212103
      ├─{mitosis},3212104
      ├─{mitosis},3212105
      ├─{mitosis},3212106
      ├─{mitosis},3212107
      ├─{mitosis},3212108
      ├─{mitosis},3212109
      ├─{mitosis},3212110
      ├─{mitosis},3212111
      ├─{mitosis},3212112
      ├─{mitosis},3212113
      ├─{mitosis},3212114
      ├─{mitosis},3212115
      ├─{mitosis},3212116
      ├─{mitosis},3212117
      ├─{mitosis},3212118
      ├─{mitosis},3212119
      ├─{mitosis},3212120
      ├─{mitosis},3212121
      ├─{mitosis},3212122
      ├─{mitosis},3212123
      ├─{mitosis},3212124
      ├─{mitosis},3212125
      ├─{mitosis},3212126
      ├─{mitosis},3212127
      ├─{mitosis},3212128
      ├─{mitosis},3212129
      ├─{mitosis},3212130
      ├─{mitosis},3212131
      ├─{mitosis},3212132
      ├─{mitosis},3212133
      ├─{mitosis},3212134
      ├─{mitosis},3212135
      ├─{mitosis},3212136
      ├─{mitosis},3212137
      ├─{mitosis},3212138
      ├─{mitosis},3212139
      ├─{mitosis},3212140
      ├─{mitosis},3212141
      ├─{mitosis},3212142
      ├─{mitosis},3212143
      ├─{mitosis},3212144
      ├─{mitosis},3212145
      ├─{mitosis},3212146
      ├─{mitosis},3212147
      ├─{mitosis},3212148
      ├─{mitosis},3212149
      ├─{mitosis},3212150
      ├─{mitosis},3212151
      ├─{mitosis},3212152
      ├─{mitosis},3212153
      ├─{mitosis},3212154
      ├─{mitosis},3212155
      ├─{mitosis},3212156
      ├─{mitosis},3212157
      ├─{mitosis},3212158
      ├─{mitosis},3212159
      ├─{mitosis},3212160
      ├─{mitosis},3212161
      ├─{mitosis},3212162
      ├─{mitosis},3212163
      └─{mitosis},3212443

For what its worth, the code in the jupyter notebook counts time with time.process_time(). When a single process is running, individual steps take O(1e3) seconds. When two processes are running, the steps each take O(1e4) seconds. I suppose I could add logging for threading and thread time.

Jacob-Stevens-Haas avatar Mar 27 '25 23:03 Jacob-Stevens-Haas

Are you able to share the notebook or a minimal example?

dgrahn avatar Jul 17 '25 15:07 dgrahn