meson icon indicating copy to clipboard operation
meson copied to clipboard

WIP: Add native pkg-config implementation

Open xclaesse opened this issue 5 years ago • 20 comments

This adds a python parser for pkg-config files and use it in PkgConfigDependency when MESON_PKG_CONFIG is set in env.

Note that this also add a command line meson pkgconfig that mimick pkg-config CLI, I've been using it for testing purpose. I'm not aiming at having any kind of CLI compatibility at all, that would add way too much complexity, I'm targetting only the subset of the features PkgConfigDependency needs.

Reasons:

  • It's faster. For me gst-build configuration goes from 20s to 17s, that's because we don't have to fork+exec and re-parse all pc files for each pkg-config invocation. I profiled that we used to spend 2.96s in total inside PkgConfigDependency._call_pkgbin and we now spend 0.053s in our custom parser.
  • Will allow fixing cases of mixing system/subproject deps. For example if we built glib subproject and do dependency('gstreamer-1.0') we could detect that gstreamer's pc file requires glib and either abort or replace it by the subproject dependency.
  • Allow using pc files on platforms where pkg-config binary is not easily available, like Windows.
  • Will allow mixing multiple sysroots. pkg-config only allow setting a common PKG_CONFIG_SYSROOT_DIR path that will be prepended to all cflag/libs paths, but pc files coming from different locations needs a different SYSROOT_DIR. That's something easier to fix at meson level.

xclaesse avatar Aug 27 '20 03:08 xclaesse

Conceptually, this looks like a good addition. However, I have some minor issues with the implementation:

  • it should be possible to disable this logic with an option more easily (unless we manage to replicate pkg-config close to 100%)
  • typing. IMHO small, self-contained modules should be fully typed
  • using pathlib instead of os.path. We don't have a project policy for this (yet), as far as I know, however, I would say that pathlib should be preferred for new code since it is less clunky and makes the code clearer.

mensinda avatar Aug 27 '20 11:08 mensinda

* it should be possible to disable this logic with an option more easily (unless we manage to replicate pkg-config close to 100%)

It's surprisingly not that far from complete actually. It's only missing support for sys_root and define_variable, at least in my local version. It can be disabled by setting PKG_CONFIG=pkg-config in the env, but I think as a first step to get feedback it could be changed to keep using pkg-config by default, and use the internal implementation only if PKG_CONFIG=meson for example.

* typing. IMHO small, self-contained modules should be fully typed

Never understood why we are sacrificing code readability for this, if I wanted type safety I would write C++. Anyway, I'll do, yes.

* using pathlib instead of os.path. We don't have a project policy for this (yet), as far as I know, however, I would say that pathlib should be preferred for new code since it is less clunky and makes the code clearer.

What's the advantage for simple os.path.join()?

xclaesse avatar Aug 27 '20 13:08 xclaesse

and use the internal implementation only if PKG_CONFIG=meson for example

Then I am afraid that no one will use it and we won't get bug reports for the edge cases... However, if it is really near 100% complete and there is good test coverage, then I would say that setting PKG_CONFIG (or native files) would be enough.

Never understood why we are sacrificing code readability for this, if I wanted type safety I would write C++.

Type safety is exactly the reason we want it because we use mypy to check it. Also, rewriting meson in C++ is really hard to sell :)

What's the advantage for simple os.path.join()?

  • it's safer (especially on Windows):
    • os.path.join('C:\\usr\\bin', 'python/asd') returns 'C:\\usr\\bin/python/asd'
    • str(PureWindowsPath('C:\\usr\\bin') / 'python/asd') returns 'C:\\usr\\bin\\python\\asd' (I used PureWindowsPath because I am testing this on Linux)
  • some_path / other_path / 'bin.sh' is less verbose and more readable than os.path.join(some_path, other_path, 'bin.sh')
  • File IO is simplified with read_text and write_text
  • I know that this won't really help to convince you but it also makes typing clearer (easier to see (and check) what is a string and what is a path)

mensinda avatar Aug 27 '20 13:08 mensinda

+1 for using pathlib. We need to start using it more consistently everywhere in Meson too. The biggest advantage for me is that treating paths as strings has always been silly, and using Path objects is much better because for one it stops people from doing string manipulation that assumes platform-specific things.

nirbheek avatar Aug 31 '20 07:08 nirbheek

use pkgconf : https://git.sr.ht/~kaniini/pkgconf

vtorri avatar Sep 22 '20 04:09 vtorri

As someone who wrote an independent implementation of pkg-config from scratch (pkgconf) I can tell you that it's not as easy as it looks initially.

Meson, of course, is free to do as it wishes, but this is likely to create problems for the entire ecosystem, as Meson will likely handle edge cases differently from system pkg-config implementation. This will lead to confusion among developers and other consumers of Meson.

And before you say there isn't any possibility of edge cases, I can assure you there are many cases where pkgconf and pkg-config have behaved differently in the past. People demand consistency, even when consistency involves emulating bugs and design defects in the original. Meson will be expected to do the same in it's tool.

An alternative would be to use libpkgconf, of which a python binding could be created easily.

This gets you the benefits of not having to run a pkg-config implementation frequently while relieving you of the burden of maintaining a new pkg-config implementation which is likely to have inconsistent behavior with system pkg-config.

If this goes forward, we will have to start scrutinizing bug reports from Meson users because we have no way of knowing if they are actually using pkgconf or the built in one, and users will likely just blame us for your differences in behavior regardless.

I implore you to consider the situation carefully for the ecosystem at large.

kaniini avatar Sep 22 '20 08:09 kaniini

I should also mention that it is likely that distributions will either patch this feature out or set PKG_CONFIG environment variable explicitly to disable it, because they are most certainly going to want consistent behaviour from pkg-config.

kaniini avatar Sep 22 '20 10:09 kaniini

Note that this is an experiment and I'm not pushing it forward at the moment, it is still a long way to be completed, mostly to get unit tests to pass.

Unfortunately we cannot use pkgconf in process because meson sticky depend only on pure python. It is a design decision to not use any native code. We can however use pkgconf CLI.

I personally do not think that there are that many edge cases, because meson only needs a subset of pkg-config. The hard part is in the CLI that I do not attempt to replicate. Meson already rewrite args it gets from the CLI a lot. But I can be wrong of course, it's too early to know because I did not write specific tests yet.

xclaesse avatar Sep 22 '20 11:09 xclaesse

The edge cases are unrelated to the CLI, but how the dependency solver works, or what the maximum length of a fragment is. We have reports where there are pc files with single lines that are 64KB long. That sort of thing is what I mean when I say edge case.

Problem is pkg-config is unspecified. There are areas where pkgconf does not behave the same as freedesktop pkg-config, and there will be areas where the Meson implementation behaves differently as well. No two implementations are completely the same, unfortunately.

kaniini avatar Sep 22 '20 11:09 kaniini

Problem is pkg-config is unspecified.

This sounds like a lot of fun and should ideally be changed at some point.

mensinda avatar Sep 22 '20 11:09 mensinda

@mgorny started working on a formal specification based on pkgconf's behaviour back in 2012, but there wasn't really much interest in it then: the freedesktop implementation maintainer was not interested at all, so he abandoned the effort. that could be a starting point for a new spec though.

but then you run into problems where people did things allowed by one or another implementation that is technically bad practice -- should those bad practices be allowed in a specification?

kaniini avatar Sep 22 '20 12:09 kaniini

Tbh if I had to write a real spec I would scrap that format completely and base or on json or any format that already have solid parsers. pkg-config has so many issues I don't understand how it made it so far

xclaesse avatar Sep 22 '20 12:09 xclaesse

@mgorny started working on a formal specification based on pkgconf's behaviour back in 2012, but there wasn't really much interest in it then: the freedesktop implementation maintainer was not interested at all, so he abandoned the effort. that could be a starting point for a new spec though.

Yep, upstream pretty much said they want to able to change it arbitrarily at any point and not have to abide by any spec. I don't see how the spec would help if the 'primary' implementation can just randomly choose to violate it.

but then you run into problems where people did things allowed by one or another implementation that is technically bad practice -- should those bad practices be allowed in a specification?

Exactly the problem with writing specs for code written before. See how HTML ended up.

mgorny avatar Sep 22 '20 12:09 mgorny

  • It's faster. For me gst-build configuration goes from 20s to 17s, that's because we don't have to fork+exec and re-parse all pc files for each pkg-config invocation. I profiled that we used to spend 2.96s in total inside PkgConfigDependency._call_pkgbin and we now spend 0.053s in our custom parser.

it's not clear, did you test this with pkgconf or real pkgconfig? my understanding is that pkgconf is (usually) much faster.

  • Will allow fixing cases of mixing system/subproject deps. For example if we built glib subproject and do dependency('gstreamer-1.0') we could detect that gstreamer's pc file requires glib and either abort or replace it by the subproject dependency.

  • Will allow mixing multiple sysroots. pkg-config only allow setting a common PKG_CONFIG_SYSROOT_DIR path that will be prepended to all cflag/libs paths, but pc files coming from different locations needs a different SYSROOT_DIR. That's something easier to fix at meson level.

these parts seems to conflict with further discussion. wouldn't this only work with meson version, so projects using them would be incompatible with system pkg-config?

Hello71 avatar Sep 22 '20 12:09 Hello71

At this point the primary implementation is pkgconf outside of Ubuntu and some Debian systems, so a spec may be more relevant now. I see pkgconf as mostly complete these days outside of Windows-specific issues. And so a spec would be good.

Then meson could code against the spec for cases where pkgconf is unavailable or the user wants to use meson's version directly. Meson is also the primary consumer of pkgconf on Windows so replacing it with a native version would reduce our Windows issues (which are mainly different parties expecting different output for their own use case on Windows).

kaniini avatar Sep 22 '20 12:09 kaniini

these parts seems to conflict with further discussion. wouldn't this only work with meson version, so projects using them would be incompatible with system pkg-config?

SYSROOT is mainly for embedded use (cross compilation). pkgconf and freedesktop pkg-config have wildly different opinions on how SYSROOT should be supported. Meson doing whatever they want here doesn't seem like a huge problem as it's already something where you have to choose an implementation that behaves as you wish.

kaniini avatar Sep 22 '20 12:09 kaniini

  • It's faster. For me gst-build configuration goes from 20s to 17s, that's because we don't have to fork+exec and re-parse all pc files for each pkg-config invocation. I profiled that we used to spend 2.96s in total inside PkgConfigDependency._call_pkgbin and we now spend 0.053s in our custom parser.

it's not clear, did you test this with pkgconf or real pkgconfig? my understanding is that pkgconf is (usually) much faster.

It was using pkgconf. That would have to be profiled further of course, but I think it's hitting a hard limit where the bottleneck is spawning new process and read all needed pc files again and again. The only way for improve that is to have it in-process. I did not try on Windows but I'm expecting the difference to be much bigger there.

The important part in those numbers is that the python implementation only spend 0.053s in the parser, so there is no further improvements to be expected. There is no point in optimizing the python code or using a native libpkgconf from a pure performance point of view. It would at best save a few miliseconds.

  • Will allow fixing cases of mixing system/subproject deps. For example if we built glib subproject and do dependency('gstreamer-1.0') we could detect that gstreamer's pc file requires glib and either abort or replace it by the subproject dependency.

To be fair, this could be done by using libpkgconf. I'm sure if we go that way we can always add the needed APIs as needed. I believe this is however too complicated to have CLI for it, unless some has suggestions?

  • Will allow mixing multiple sysroots. pkg-config only allow setting a common PKG_CONFIG_SYSROOT_DIR path that will be prepended to all cflag/libs paths, but pc files coming from different locations needs a different SYSROOT_DIR. That's something easier to fix at meson level.

Support for this can be added in pkgconf CLI of course. After discussion with @kaniini there is will probably be done with new env var PKG_CONFIG_SYSROOT_MAP I suggested. I personally don't mind requiring pkgconf over pkg-config to have those kind of use-case working.

SYSROOT is mainly for embedded use (cross compilation). pkgconf and freedesktop pkg-config have wildly different opinions on how SYSROOT should be supported. Meson doing whatever they want here doesn't seem like a huge problem as it's already something where you have to choose an implementation that behaves as you wish.

When writing this PR I wanted to have PKG_CONFIG_SYSROOT_DIR set ${prefix} because that seemed to be the right thing to do, but then I realized it's not what pkg-config does so I decided to keep compat with pkg-config. Now I learned that pkgconf realized it is an issue too and they did break compat on that by setting ${prefix}, so I'll definitely update to do that too. At this point pkgconf is the reference for compat IMHO.

xclaesse@xclaesse-x395:~$ PKG_CONFIG_SYSROOT_DIR=/foo pkg-config --variable=prefix gstreamer-1.0
/usr
xclaesse@xclaesse-x395:~$ PKG_CONFIG_SYSROOT_DIR=/foo pkgconf --variable=prefix gstreamer-1.0
/foo/usr

xclaesse avatar Sep 22 '20 18:09 xclaesse

As mentioned on IRC...

Unfortunately we cannot use pkgconf in process because meson sticky depend only on pure python. It is a design decision to not use any native code. We can however use pkgconf CLI.

I do not see the problem with meson containing a ctypes interface to libpkgconf in mesonbuild/pkgconf_c.py or whatever, which if unavailable because libpkgconf.so cannot be found, falls back on subprocesses.

Alternatively, it might be nice for pkgconf to ship with its own bindings, either via ctypes or the C-API. It would then be possible to pip install pkgconf and get a prebuilt wheel containing a statically linked libpkgconf in a C-API binding for every popular version of python on each OS (assuming the PyPI pkgconf maintainer uploads a comprehensive matrix of compiled wheels).

Users could then pip install meson[fast] to get a fast version of meson by using extras_require, or just install meson on its own and have it be a bit slower via subprocesses.

eli-schwartz avatar Sep 23 '20 01:09 eli-schwartz

  • Will allow mixing multiple sysroots. pkg-config only allow setting a common PKG_CONFIG_SYSROOT_DIR path that will be prepended to all cflag/libs paths, but pc files coming from different locations needs a different SYSROOT_DIR. That's something easier to fix at meson level.

Support for this can be added in pkgconf CLI of course. After discussion with @kaniini there is will probably be done with new env var PKG_CONFIG_SYSROOT_MAP I suggested. I personally don't mind requiring pkgconf over pkg-config to have those kind of use-case working.

No idea how PKG_CONFIG_SYSROOT_MAP would work, but for the record, I patched pkg-config to support for multiple sysroots: https://github.com/LibreELEC/LibreELEC.tv/commit/a6991f0abaecf97dd8c3ddc70c7fc153524fb643#diff-77e47c2f9a12e23c5253dccc6374d54cd8159c4c9cb5d5ade9580ef91594cfff

dhewg avatar Jan 30 '21 18:01 dhewg

It seems this implementation suffers the same bug as pkgconf 1.9 - 2.0 described here: https://github.com/pkgconf/pkgconf/issues/322 , where abseil lib order is incorrect.

I think the fix is in https://github.com/pkgconf/pkgconf/commit/45073b7460b88eb7bec44a6b53d77eea939e97bd or https://github.com/pkgconf/pkgconf/commit/be1ea7888bc7e882a6c1e17573be35a08b5637e9, but this is unclear to me.

bruchar1 avatar Mar 22 '24 18:03 bruchar1