kedro
kedro copied to clipboard
[spike] Improve Kedro CLI startup time
Description
Currently, Kedro CLI startup time can be slow even if very little work is done, for example, doing kedro
to show the available command or help can take a few seconds. These commands should feel instant since it is just printing out some help text without any heavy work.
Useful tool to profile import time
- https://github.com/asottile/importtime-waterfall
-
python -X importtime run.py
- The
kedro
command itself takes 0.7s, which isn't fast but still an acceptable response time. Initial investigation suggests that it can take few seconds mainly due to a slow import during plugin registration.
Kedro startup time with plugin installed
(0.7s, removed screenshot)
Kedro startup time with no plugin installed (telemetry + viz)
Causes:
- Slow import from plugin
- Slow import from
kedro
itself - Slow import from
pkg_resource
(something that is run withconsole_script
)
- https://stackoverflow.com/questions/53223214/slow-response-of-cli-written-with-python-click
- https://github.com/pypa/setuptools/issues/510
Context
Why is this change important to you? How would you use it? How can it benefit other users?
The Kedro CLI can be slow sometimes and lead to a bad user experience. It should be the most common way that users interact with kedro.
Possible Implementation
(Optional) Suggest an idea for implementing the addition or change. The first improvement that can be done is to do lazy loading in plugin registration to address (1). (2) & (3) will requires more time to study.
Possible Alternatives
(Optional) Describe any alternative solutions or features you've considered. Compile some of the kedro CLI, so it prints out help message without actually loading any python code.
e.g.
kedro
-> Load some pre-compiled text files and print, it may be tricky with plugins though.
Some older issues about this topic: https://github.com/quantumblacklabs/private-kedro/issues/930 & https://github.com/quantumblacklabs/private-kedro/pull/537
I created PR that should solve this problem in kedro-org/kedro-viz#1196 - any help with making tests passed would be appreciated
(From #3033)
Still an issue. For example, kedro info
on a spaceflights
project takes a bit more than 6 seconds on my machine (M1 MacBook Pro from 2022):
$ time kedro info
_ _
| | _____ __| |_ __ ___
| |/ / _ \/ _` | '__/ _ \
| < __/ (_| | | | (_) |
|_|\_\___|\__,_|_| \___/
v0.18.13
Kedro is a Python framework for
creating reproducible, maintainable
and modular data science code.
Installed plugins:
kedro_telemetry: 0.2.5 (entry points:cli_hooks,hooks)
kedro_viz: 6.5.0 (entry points:global,hooks,line_magic)
________________________________________________________
Executed in 6.27 secs fish external
usr time 1.15 secs 0.26 millis 1.15 secs
sys time 0.67 secs 2.65 millis 0.67 secs
After warmup (i.e. executing the command again immediately after) this can go down to ~1 second.
Ahead of this I did some profiling using https://github.com/joerick/pyinstrument and https://www.speedscope.app/, and this is what I found:
For kedro info
with instrumentation on and no warmup, 7.5 seconds were spent importing the various kedro.framework.cli.*
subpackages, and 1.7 seconds actually executing the function.
Some users are puzzled that kedro
gets stuck and sometimes it's difficult to tell whether it's just slow https://linen-slack.kedro.org/t/16630828/i-m-encountering-issues-when-running-the-load-ext-kedro-ipyt#3758284a-34fb-435c-b054-dd3713cd78c9
Today with @rashidakanchwala kedro viz run
took like 30 seconds on my M1 Mac.
I don't know what happened lately but Kedro has got unacceptably slow. I'm raising the priority of this.
❯ time kedro registry list
- __default__
- data_processing
- data_science
- reporting
________________________________________________________
Executed in 24.37 secs fish external
usr time 5.08 secs 245.00 micros 5.08 secs
sys time 4.95 secs 593.00 micros 4.95 secs
❯ time kedro registry list
- __default__
- data_processing
- data_science
- reporting
________________________________________________________
Executed in 3.72 secs fish external
usr time 2.82 secs 0.24 millis 2.82 secs
sys time 4.56 secs 1.33 millis 4.56 secs
On my computer, the first time it takes 24 seconds to run kedro registry list
, second time it's 3 seconds.
As a first step, we may need a better understanding of how slow Kedro CLI startup time is, and why. Some questions I have:
- Are there any commands that are slower? (Looks like
kedro info
is faster thankedro registry list
) - What's the effect of having plugins? Specifically, what's the effect of having/not having
kedro-viz
,kedro-datasets
? - All other things equal, are there significant performance differences in different operating systems?
- The "all other things equal" might be difficult to test due to differences in hardware
- Still, it's important to understand because different systems will do different smart things for caching (see the 24 vs 3 second difference)
Then as a next step, we'd need an assessment on what can we do about this. Ideally it shouldn't take this long to run seemingly simple commands, even with a cold cache. Can we aim for a combination of slashing the time in ~half + showing something during those initial seconds so the user knows that it didn't get stuck?
Profile of the command kedro registry list
using pyinstrument
and visualisation using speedscope.app
Note: this is when kedro-viz
and kedro-telemetry
are also installed in the environment, and we have consented to sending data to kedro-telemetry
JSON: https://github.com/ankatiyar/kedro-profiles/blob/main/registry.json
Command: pyinstrument -r speedscope -o registry.json -m kedro registry list
From profiling various commands, there's three main things (that take up majority of the time) that happen before the code for the specific command is executed -
- initialisation of the
kedro/framework/cli
module -kedro/framework/cli/__init__.py
~500ms Some part of this takes time importingcatalog
/micropkg
etc sub commands + the libraries used in those commands - initialisation of
KedroCLI
-kedro/framework/cli.py: KedroCLI.__init__
~1.7s This part loads up all the plugin commands includingkedro-viz
which is quite heavy and imports a lot of libraries in itself - Execution of the
before_command_run
hook, which sends data tokedro-telemetry
~0.9-1s
That is almost 3s** before the actual code of the command is actually executed. ** on my M1 macbook
Proposal
On perusing the click
documentation I came across this example of lazy loading of sub-commands: https://click.palletsprojects.com/en/8.1.x/complex/#lazily-loading-subcommands
1. Lazy loading of subcommands on kedro-viz
Quick prototype: https://github.com/kedro-org/kedro-viz/pull/1920
Refactor the kedro-viz
CLI with delayed imports and lazy loaded commands
This shaves off time when KedroCLI
is initialised -
https://github.com/kedro-org/kedro/blob/011e3a8f07310a915e6cc90bbe22c8606866bffd/kedro/framework/cli/cli.py#L105-L107
Profile:
2. Lazy loading of subcommands of Kedro
Quick prototype: https://github.com/kedro-org/kedro/pull/3883
This does not seem to save too much time, on my system, sometimes it does and sometimes it doesn't. 🤔
For eg, kedro info
is sometimes faster
Profile: LazyGroup
is used for both Kedro CLI and Kedro-viz CLI
3. Use after_command_run
on kedro-telemetry
We spend ~1s sending telemetry info before we actually execute the command. This can be moved to the after_command_run
hook which will not affect the actual total time of the command that is run but the results of the command actually show up sooner. Additionally, after_command_run
hook also receives the exit code of the command which might be useful telemetry info?
Profile of kedro info
and kedro registry list
with implemented suggestions
kedro info |
current | lazy viz | lazy viz + kedro cli | telemetry after_command_run |
---|---|---|---|---|
kedro/framework/cli/init | 3.37s | 1.97s | 2.15s | 2.05s |
KedroCLI init | 579ms | 649ms | 511ms | 473ms |
telemetry hook | 981ms | 867ms | 1.02s | 1s |
time before command is executed (approx) | 3.2s | 1.80s | 2s | 1s |
kedro registry list |
current | lazy viz | lazy viz + kedro cli | telemetry after_command_run |
---|---|---|---|---|
kedro/framework/cli/init | 5s | 3.6s | 4s | 4s |
KedroCLI init | 705ms | 649ms | 417ms | 486ms |
telemetry hook | 959ms | 877ms | 1.01s | 1.02s |
time before command is executed (approx) | 3.25s | 1.80s | 1.8s | 900ms |
Please note that this is just from one run of the commands, the time varies a little bit between each run of the command ^
Conclusion
Of the above suggestions, I think we would benefit greatly from making the Kedro-viz CLI either lazy or lighter. Moving to after_command_run
hook on kedro-telemetry
would also make the apparent time taken by the commands much faster.
On Kedro side, it's not super beneficial to make the subcommands lazy but we could do it to make things like kedro info
slightly faster
ALSO:
The JSON files if you want to open them in speedscope.app
yourself - https://github.com/ankatiyar/kedro-profiles
Miro board with the above screenshots + also same thing for kedro info
- https://miro.com/app/board/uXjVKFQAxA4=/?share_link_id=239496676079
Thanks a lot @ankatiyar, this is an awesome summary.
There is one thing I think might be preventing us from getting useful results though. Consider this comparison between how much time it took to run kedro registry list
right after turning on my laptop on an existing environment (therefore in theory all .pyc
files were there) vs how much time it took to run right after ("hot cache").
First thing in the morning: registry.json
Right after: registry_hot_cache.json
These is my environment. Notice that I have a different one (with kedro-mlflow
and other stuff).
$ uv pip freeze
aiofiles==23.2.1
alembic==1.13.1
aniso8601==9.0.1
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
anyio==3.7.1
appdirs==1.4.4
appnope==0.1.4
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
attrs==23.2.0
babel==2.15.0
beautifulsoup4==4.12.3
binaryornot==0.4.4
bleach==6.1.0
blinker==1.8.2
build==1.2.1
cachetools==5.3.3
certifi==2024.2.2
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
click-default-group==1.2.4
cloudpickle==3.0.0
comm==0.2.2
contourpy==1.2.1
cookiecutter==2.6.0
coverage==7.5.1
cycler==0.12.1
debugpy==1.8.1
decorator==5.1.1
defusedxml==0.7.1
deprecated==1.2.14
dnspython==2.6.1
docker==7.0.0
dynaconf==3.2.5
email-validator==2.1.1
entrypoints==0.4
et-xmlfile==1.1.0
executing==2.0.1
fastapi==0.111.0
fastapi-cli==0.0.4
fastjsonschema==2.19.1
flask==3.0.3
fonttools==4.51.0
fqdn==1.5.1
fsspec==2024.5.0
gitdb==4.0.11
gitpython==3.1.43
graphene==3.3
graphql-core==3.2.3
graphql-relay==3.2.0
gunicorn==22.0.0
h11==0.14.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
idna==3.7
importlib-metadata==7.1.0
importlib-resources==6.4.0
iniconfig==2.0.0
ipykernel==6.29.4
ipython==8.24.0
isoduration==20.11.0
itsdangerous==2.2.0
jedi==0.19.1
jinja2==3.1.4
joblib==1.4.2
json5==0.9.25
jsonpointer==2.4
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
jupyter-client==8.6.1
jupyter-core==5.7.2
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter-server==2.14.0
jupyter-server-terminals==0.5.3
jupyterlab==4.2.0
jupyterlab-pygments==0.3.0
jupyterlab-server==2.27.1
kedro==0.19.5
kedro-datasets==3.0.0
kedro-mlflow==0.12.2
kedro-telemetry==0.4.0
kedro-viz==9.0.0
kiwisolver==1.4.5
lazy-loader==0.4
mako==1.3.5
markdown==3.6
markdown-it-py==3.0.0
markupsafe==2.1.5
matplotlib==3.9.0
matplotlib-inline==0.1.7
mdurl==0.1.2
mistune==3.0.2
mlflow==2.13.0
more-itertools==10.2.0
nbclient==0.10.0
nbconvert==7.16.4
nbformat==5.10.4
nest-asyncio==1.6.0
networkx==3.3
notebook==7.2.0
notebook-shim==0.2.4
numpy==1.26.4
omegaconf==2.3.0
openpyxl==3.1.2
opentelemetry-api==1.16.0
opentelemetry-sdk==1.16.0
opentelemetry-semantic-conventions==0.37b0
orjson==3.10.3
overrides==7.7.0
packaging==23.2
pandas==2.2.2
pandocfilters==1.5.1
parse==1.20.1
parso==0.8.4
pexpect==4.9.0
pillow==10.3.0
platformdirs==4.2.2
plotly==5.22.0
pluggy==1.3.0
pre-commit-hooks==4.6.0
prometheus-client==0.20.0
prompt-toolkit==3.0.43
protobuf==4.25.3
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==15.0.2
pycparser==2.22
pydantic==2.7.1
pydantic-core==2.18.2
pygments==2.18.0
pyinstrument==4.6.2
pyparsing==3.1.2
pyproject-hooks==1.1.0
pytest==7.4.4
pytest-cov==3.0.0
pytest-mock==1.13.0
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-json-logger==2.0.7
python-multipart==0.0.9
python-slugify==8.0.4
pytoolconfig==1.3.1
pytz==2024.1
pyyaml==6.0.1
pyzmq==26.0.3
querystring-parser==1.2.4
referencing==0.35.1
requests==2.32.1
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.7.1
rope==1.13.0
rpds-py==0.18.1
ruamel-yaml==0.18.6
ruamel-yaml-clib==0.2.8
ruff==0.1.15
scikit-learn==1.4.2
scipy==1.13.0
seaborn==0.12.2
secure==0.3.0
send2trash==1.8.3
setuptools==70.0.0
shellingham==1.5.4
six==1.16.0
smmap==5.0.1
sniffio==1.3.1
soupsieve==2.5
speedscope==1.0
sqlalchemy==2.0.30
sqlparse==0.5.0
stack-data==0.6.3
starlette==0.37.2
strawberry-graphql==0.229.1
tenacity==8.3.0
terminado==0.18.1
text-unidecode==1.3
threadpoolctl==3.5.0
tinycss2==1.3.0
toml==0.10.2
toposort==1.10
tornado==6.4
traitlets==5.14.3
typer==0.12.3
types-python-dateutil==2.9.0.20240316
typing-extensions==4.11.0
tzdata==2024.1
ujson==5.10.0
uri-template==1.3.0
urllib3==2.2.1
uvicorn==0.29.0
uvloop==0.19.0
watchfiles==0.21.0
watchgod==0.8.2
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.8.0
websockets==12.0
werkzeug==3.0.3
wrapt==1.16.0
zipp==3.18.2
I don't know if it's something specific to my computer (2021 MacBook Pro, Apple M1 Max, 32 GB of RAM), but I anticipate that this will probably make effective performance measurements quite hard.
Some pointers, in case it helps:
- https://github.com/airspeed-velocity/asv/
- https://github.com/ionelmc/pytest-benchmark/
Having said that, if at least the times are ~proportional in the cold & hot cache situations, then this doesn't affect anything and we can proceed with confidence.
Amazing work @ankatiyar - Can you also run this when you have a lot of nodes, datasets and parameters for the catalog to resolve? My hunch is that is the real bottleneck felt with users at scale.
@datajoely Conversation about performance is spread around different issues that tackle different aspects:
- Startup (this one)
- Complex config #3893
- Complex pipelines #3866
Startup affects even simple commands, like kedro info
and kedro registry list
.
Excellent - I still feel these can be thought of as sub-issues of the start up parent. Runtime performance once the session and catalog is built is fine.
Looking at @astrojuanlu's profile of the command, it also seems like loading of project_groups
commands like the ones in kedro-mlflow
also take up a bunch of time so just implementing lazy loading on kedro-viz
might only solve part of the problem. kedro-mlflow
~13s and kedro-viz
~5s
I'm trying out if all commands can somehow be lazily loaded for all plugins on Kedro side. [TO BE UPDATED]
As for hot-caching, I think the times seem proportional, even on my machine, I ran the commands a few time so the profiles shown in the above comment are after the warmup. I don't know if we can do anything about the performance differences between when your OS is warmed up or not but the lazy loading and the telemetry stuff is very small effort changes for a relatively faster CLI experience on our end.
Great investigation, @ankatiyar! I also share the concern about cold starts versus subsequent runs. Like @astrojuanlu, my first command takes more than 10-20 seconds, while subsequent runs are much faster. I like the idea of moving kedro-telemetry
to the after_command_run
hook to speed up the original code execution. But will we lose information about failed runs if we make this change?
Looking at @astrojuanlu's profile of the command, it also seems like loading of project_groups commands like the ones in kedro-mlflow
@ankatiyar Is there anything we can do from framework side to control this instead of implementing this in the plugin library? Is the Lazy loading having no effect (does libraries still get imported?)
telemetry is a quick win, echo @DimedS we should make sure it still send telemetry when pipeline fails.
@noklam @DimedS I believe the after_command_run is still executed if the run fails and it also receives the exit code of the command run. I'll confirm though
I don't know if the architecture of pluggy or kedro allows this, but if we don't expect plugins to be able to add pipelines to the registry, maybe we should try to delay the plugin discovery process as much as possible. If one runs kedro registry list
or kedro info
in principle that should already be enough information to not load the CLI extensions, right?
Update re: Lazy Loading of Plugins
TLDR: It's possible to lazy load plugins and their commands on Kedro side
Prototype PR: https://github.com/kedro-org/kedro/pull/3901
Profile of regular kedro registry list
Updated proposal
[!NOTE] this time I have
kedro-mlflow
+kedro-airflow
+kedro-telemetry
+kedro-viz
installed in the system for the below profiles. Also have consented to collection ofkedro-telemetry
data.
Lazy loading of plugins and their commands
In #3901 -
- We don't load the entry points of all the plugins eagerly, just get the entry points and pass them to
CommandCollection
. We already use a custom version ofclick
'sCommandCollection
-https://github.com/kedro-org/kedro/blob/27f5405cefd6701ffac4c6243030486fb7d3c942/kedro/framework/cli/utils.py#L111 - If a command is not in existing list of commands
CommandCollection.list_commands()
, add plugins till it is. - Execute the command
Profile of kedro registry list
with lazy loading of plugins
Send telemetry info in after_command_run
[Out of scope]
See: https://github.com/kedro-org/kedro-plugins/issues/709 As mentioned in https://github.com/kedro-org/kedro/issues/1476#issuecomment-2135588917, this would not reduce the TOTAL time taken by a command but will reduce the apparent time taken by the command.
Profile of kedro registry list
with lazy loading of plugins + telemetry after_command_run
Profile data
kedro registry list |
regular | with lazy plugin loading | lazy + telemetry after |
---|---|---|---|
initialisation of CLI module | 858ms | 576ms | 553 |
KedroCLI init | 4.11s | 270ms | 287ms |
sending telemetry | 1s | 0.8s | 1s |
time before the main code runs | ~6s | ~1.7s | ~0.8s |
total | 6.96s | 3.76s | 3.8s |
kedro info |
regular | with lazy plugin loading | lazy + telemetry after |
---|---|---|---|
initialisation of CLI module | 860ms | 737ms | 846ms |
KedroCLI init | 3.97s | 292ms | 304ms |
sending telemetry | 1.05s | 880ms | 858ms |
time before the main code runs | ~5.7s | ~1.9s | ~0.8s |
total | 5.99s | 2.03s | 2.12s |
Things to note
- It might take same or more time to run plugin commands - there might be a better way to load only the plugin for which the command we're running than just iterating through the entrypoints till we load the plugin which has the command
- There'll also be some delay when the command is wrong or doesn't exist. For eg
kedro infroo
will take a while before the error message is displayed because we'll spend some time loading plugins. I don't think this is too bad since we eagerly load these commands already right now. - We could still consider combining this with the
LazyGroup
strategy in https://github.com/kedro-org/kedro-viz/pull/1920 and https://github.com/kedro-org/kedro/pull/3883 or just look into improving the CLI setup onkedro-viz
in general. Kedro viz could be sped up a little bit by delaying imports at least 🤔
I'm amazed kedro-telemetry
is so chunky, we should absolutely defer it till after the important stuff, but I wonder if async is sensible here too.
but I wonder if async is sensible here too.
Yeah in this case it would make sense to make it async, but... https://github.com/pytest-dev/pluggy/issues/320
I'm amazed
kedro-telemetry
is so chunky, we should absolutely defer it till after the important stuff, but I wonder if async is sensible here too.
I was just testing the switch to after_command_run
hook on kedro-telemetry
, seems like the hook doesn't get executed when there is an error in the pipeline. (cc @noklam and @DimedS also because they asked before)
I think async would be the way to go
I'll create a separate issue for kedro-telemetry
stuff and we can focus on the plugin loading stuff here.
after_command_run https://github.com/kedro-org/kedro-plugins/blob/064c5d9bc491a3f96f4341a2c2d1f034b7b354cb/kedro-telemetry/kedro_telemetry/plugin.py#L185
We still have the project hook that send telemetry immediately after. We should combine the two hooks to avoid checking consent twice as well.
For now, shall we then proceed with
- https://github.com/kedro-org/kedro/pull/3901
- and https://github.com/kedro-org/kedro/pull/3883
? How does https://github.com/kedro-org/kedro-viz/pull/1920 interact or conflict with https://github.com/kedro-org/kedro/pull/3901 ?
For now, shall we then proceed with
? How does kedro-org/kedro-viz#1920 interact or conflict with #3901 ?
These solutions can be used together of course,
- First step: #3901 Loads plugins lazily when
KedroCLI
is initialised - Second step: #3883 & https://github.com/kedro-org/kedro-viz/pull/1920 load the "subcommands" lazily (catalog commands, micropkg commands, etc and run and deploy for viz)
I think it's definitely worth refactoring CLI code on kedro-viz
side as there is a lot of heavy imports in their cli.py
There's another thing to note -
Right now, if you do kedro --help
, you also see all plugin commands:
After lazy plugin loading (i.e after #3901), we lose the plugin commands from the top-level help text:
But you can always do kedro airflow --help
or kedro mlflow --help
Good point about losing visibility of the full list in kedro --help
. Wondering if it's any way to retain both.
Are there any contraindications to:
- start kedro process in the background,
- make cli communicate with it,
- handle cli commands by forking the process?
This way, once loaded, the process with libraries would be ready to fork, unless stopped using cli command.
@znfgnu IPython essentially?
I think - and correct me if I'm wrong - the suggestion is about creating a long living kedro process and then a client process expose the CLI and can interact with it without the start-up penalty
Would the user need to do anything differently? If we can make this almost invisible to the users then it's great.
Is there any examples of Python base CLI doing this?
The current notion of Kedro is that it's a library and a framework. I don't think we should underestimate the consequences of making it a service, in terms of software architecture, reusability, perception etc. Feel free to open a new discussion https://github.com/kedro-org/kedro/discussions/