Logging callbacks have too much overhead on libmambapy
Troubleshooting docs
- [X] My problem is not solved in the Troubleshooting docs
Anaconda default channels
- [X] I do NOT use the Anaconda default channels (pkgs/* etc.)
How did you install Mamba?
Mambaforge or latest Miniforge
Search tried in issue tracker
logging debug
Latest version of Mamba
- [X] My problem is not solved with the latest version
Tried in Conda?
Not applicable
Describe your issue
2.x API includes logging callbacks so the information reported by libsolv or libmamba can be routed through a Python callable via libmambapy.solver.libsolv.Database.set_logger(). However, this becomes very slow when the libsolv throughput is too large (e.g. verbosity=DEBUG). It causes a slowdown so large that the solver takes minutes to finish.
Are there any ways to have a less resource consuming way? I don't need fancy formatting or full control, just the ability to decide when to send different levels to stdout based on the verbosity chosen by the user.
mamba info / micromamba info
This is with libmambapy 2.0.0rc3
Logs
N/A
environment.yml
N/A
~/.condarc
N/A
Thank you for the report.
We have observed a tangential issue for some environments whose resolution via libsolv was hanging due to the logging system. Both issue have the same root cause: all of the possible logs of libsolv are emitted by default as we set log_level to std::numeric_limits<int>::max().
See: https://github.com/openSUSE/libsolv/blob/27aa6a72c7db73d78aa711ae412231768e77c9e0/src/pool.c#L1623-L1637
https://github.com/mamba-org/mamba/blob/08d7b4e9cb9b168d05f3bdc57c0708ee80d5a1c9/libmamba/src/solver/libsolv/database.cpp#L90-L108
https://github.com/mamba-org/mamba/blob/08d7b4e9cb9b168d05f3bdc57c0708ee80d5a1c9/libmamba/src/solver/libsolv/database.cpp#L110-L127
https://github.com/mamba-org/mamba/blob/154012f256d32bee1ae64f5b11d694cb45d75cef/libmamba/src/core/package_database_loader.cpp#L31-L53
An option can be for (release) builds not to emit debug logs unless if explicitly asked by users (with verbosity=DEBUG as you propose).
#3416 addresses the performance regression. We think that the handling of log levels and of verbosity better be implemented in another PR.
@jaimergp: can you report if you have observed improvements with 2.0.0rc4?
@jaimergp: can you report if you have observed improvements with 2.0.0rc4?
In the sense that there are no performance overheads, yes. But I can't seem to be able to invoke obtain the libsolv logs anymore.
Isn't this sufficient?
libmamba_context = libmambapy.Context(
libmambapy.ContextOptions(
enable_signal_handling=False,
enable_logging=True,
)
)
# Output params
libmamba_context.output_params.json = context.json
if libmamba_context.output_params.json:
libmambapy.cancel_json_output(libmamba_context)
libmamba_context.output_params.quiet = context.quiet
libmamba_context.set_log_level(
{
4: libmambapy.LogLevel.TRACE,
3: libmambapy.LogLevel.DEBUG,
2: libmambapy.LogLevel.INFO,
1: libmambapy.LogLevel.WARNING,
0: libmambapy.LogLevel.ERROR,
}[context.verbosity]
)
This ^ correctly gives the libmamba logs, but not for libsolv. I thought I needed to call set_logger() on the Database instances:
params = ChannelResolveParams(
platforms=set(self.subdirs),
channel_alias=CondaURL.parse(str(context.channel_alias)),
custom_channels=ChannelResolveParams.ChannelMap(custom_channels),
custom_multichannels=ChannelResolveParams.MultiChannelMap(custom_multichannels),
home_dir=str(Path.home()),
current_working_dir=os.getcwd(),
)
db = Database(params)
db.set_logger(logger_callback)
with logger_callback():
def logger_callback(level: libmambapy.solver.libsolv.LogLevel, msg: str, logger=_db_log):
# from libmambapy.solver.libsolv import LogLevel
# levels = {
# LogLevel.Debug: logging.DEBUG, # 0 -> 10
# LogLevel.Warning: logging.WARNING, # 1 -> 30
# LogLevel.Error: logging.ERROR, # 2 -> 40
# LogLevel.Fatal: logging.FATAL, # 3 -> 50
# }
if level.value == 0:
# This incurs a large performance hit!
logger.debug(msg)
else:
logger.log((level.value + 2) * 10, msg)
But no luck?
Yes, we entirely disabled libsolv's logs for now. We need to think about how to properly handle them as well as alternative solvers'.
Feel free to open a PR to propose a solution.
Would it be reasonable to let libsolv print to stderr directly in debug level?
Probably.
Edit: I do not have bandwidth to properly think about it now.
Not being able to get solver logs from python is quite inconvenient. I had a case this morning where it would have saved me time to have at least some kind of logs.
In the short-term, we could can consider changing those lines:
https://github.com/mamba-org/mamba/blob/c9c45806e214841b391f7349e7caf182c5fac078/libmamba/src/solver/libsolv/database.cpp#L112-L123
to add such an hard-coded logic for now:
const int level = 3;
::pool_setdebuglevel(pool().raw(), level);
Alternatively discussions in #2288 need to be resumed.
Now that #4016 is merged, there is a way to provide a completely custom implementation of the logging system to libmamba to use, which in theory should help with this issue.
However, we didnt yet provide a way for python code to provide their own log-handler, it's only possible in C++ for now. This is planned but not started yet.