prometheus-api-client-python icon indicating copy to clipboard operation
prometheus-api-client-python copied to clipboard

Ways to import `PrometheusConnect` without also importing **huge** pandas and matplotlib

Open buttonfly1000 opened this issue 3 years ago • 11 comments

Is your feature request related to a problem? Please describe.

I found this simple import

from prometheus_api_client import PrometheusConnect

not only import PrometheusConnect itself, but also pandas and possibly matplotlib, which take about 50MB more unnecessary memory when I don't want to use DataFrames and plot them.

Is there any way to only import PrometheusConnect without also importing huge pandas and matplotlib?

buttonfly1000 avatar Nov 04 '21 01:11 buttonfly1000

not only import PrometheusConnect itself, but also pandas and possibly matplotlib, which take about 50MB more unnecessary memory when I don't want to use DataFrames and plot them.

Hi @thetaprimeprime, that's an great observation! I did some memory profiling and can confirm that the additional pandas and matplotlib imports do indeed increase the memory usage by about ~45MB.

Is there any way to only import PrometheusConnect without also importing huge pandas and matplotlib?

At the moment, I don't think so. But I believe this would be a nice and welcome improvement :smiley: Is this something you'd like to work on, or would you rather someone from our team do it?

One way to accomplish this could be to refactor this python module into submodules, something like this:

prometheus_api_client
├── core
│   ├── __init__.py
│   └── prometheus_connect.py
├── exceptions
│   ├── base_exception.py
│   └── __init__.py
├── __init__.py --> only import core.* here, to avoid importing mpl, pandas
├── parsers
│   ├── __init__.py
│   ├── metric.py
│   ├── metric_range_df.py
│   ├── metrics_list.py
│   └── metric_snapshot_df.py
└── utils
    ├── datetime_utils.py
    ├── __init__.py
    └── print_utils.py

Just a suggestion off the top of my head, we should explore other ideas as well.

/cc @4n4nd

chauhankaranraj avatar Dec 23 '21 16:12 chauhankaranraj

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

sesheta avatar Mar 23 '22 19:03 sesheta

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

sesheta avatar Apr 22 '22 20:04 sesheta

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

sesheta avatar May 22 '22 23:05 sesheta

@sesheta: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sesheta avatar May 22 '22 23:05 sesheta

This is also making it hard to use prometheus-api-client-python in alpine-based docker images, as both matplotlib & pandas need to build wheels.

raqbit avatar Mar 28 '23 10:03 raqbit

@Raqbit do you have any suggestions on how we could improve this?

4n4nd avatar Mar 28 '23 17:03 4n4nd

@Raqbit do you have any suggestions on how we could improve this?

Any way that the pandas & matplotlib dependencies can be avoided with a 'non-development' install would help with this. I'm not very familiar with how this is usually done, but I think setuptools optional dependencies will do the job.

raqbit avatar Mar 28 '23 17:03 raqbit

This is also making it hard to use prometheus-api-client-python in alpine-based docker images, as both mathplotlib & pandas need to build wheels.

Hi @Raqbit, could you please describe your issue in a bit more detail i.e. why is hard to use in alpine based container images? Are you unable to install these dependencies in the container? Or do they take a longer time to install? Or do they bloat the image?

Basically, I think OP’s issue here is with slowness of the imports at runtime (which imo can be easily solved by a restructure). Whereas your issue sounds a bit more about installation? (which can't be solved by restructure, and is a bit more involved). So just wanted to get some clarity and prioritize accordingly :)

chauhankaranraj avatar Mar 31 '23 04:03 chauhankaranraj

This is also making it hard to use prometheus-api-client-python in alpine-based docker images, as both mathplotlib & pandas need to build wheels.

Hi @Raqbit, could you please describe your issue in a bit more detail i.e. why is hard to use in alpine based container images? Are you unable to install these dependencies in the container? Or do they take a longer time to install? Or do they bloat the image?

Basically, I think OP’s issue here is with slowness of the imports at runtime (which imo can be easily solved by a restructure). Whereas your issue sounds a bit more about installation? (which can't be solved by restructure, and is a bit more involved). So just wanted to get some clarity and prioritize accordingly :)

Yes. Both pandas and matplotlib (Plus, numpy which is pulled in by pandas) have native components written in C. Normally this means that the Python Wheel (pre-compiled binary component) is downloaded and all is well (except for the longer import times mentioned by OP). For Alpine Linux, however, there is no such wheel available as Alpine uses the musl C standard library implementation instead of the more common glibc. This, in turn, will cause Pip to try to compile the C code for these native components during the installation process as it is unable to find suitable wheels in the package from pypi.

This requires a complete c compiler toolchain to be available, and in my case, was taking 20+ minutes to complete (minutes of compiling C-code with all available cores, making the computer unusable for other tasks).

The bloat of the compiler toolchain can be avoided by uninstalling them after running pip install, but the compilation step cannot be avoided without using a different container base-image such as debian.

raqbit avatar Mar 31 '23 10:03 raqbit

Yes. Both pandas and mathplotlib (Plus, numpy which is pulled in by pandas) have native components written in C. Normally this means that the Python Wheel (pre-compiled binary component) is downloaded and all is well (except for the longer import times mentioned by OP). For Alpine Linux, however, there is no such wheel available as Alpine uses the musl C standard library implementation instead of the more common glibc. This, in turn, will cause Pip to try to compile the C code for these native components during the installation process as it is unable to find suitable wheels in the package from pypi.

This requires a complete c compiler toolchain to be available, and in my case, was taking 20+ minutes to complete (minutes of compiling C-code with all available cores, making the computer unusable for other tasks).

The bloat of the compiler toolchain can be avoided by uninstalling them after running pip install, but the compilation step cannot be avoided without using a different container base-image such as debian.

Got it, thanks for the details! In theory yes we could set up installation so that pip install prometheus-api-client-python just installs the "core" library components (PrometheusConnect). And pandas and matplotlib can be listed as dependencies in extras_require to be installed with something like pip install prometheus-api-client-python[full], installing the other componenets (Metric, MetricSnapshotDataFrame, etc). Note that numpy is still required by PrometheusConnect, so it'd require additional changes to move that to extras_require.

However, wouldn't doing so potentially break installations for existing users? I'm in favor of this change if it causes minimal disruption for users while maximizing the benefit. Or if we can map out a rollout procedure to ensure so. But I currently don't have any supporting artifacts to form an opinion on this. Maybe @4n4nd can weigh in here as well?

chauhankaranraj avatar Apr 06 '23 03:04 chauhankaranraj