rules_python Allow glob expression in experimental_requirement

Currently pip_parse supports a feature to "fix" cycles among third-party packages, for example:

pip_parse(
    ...
    experimental_requirement_cycles = {
        "airflow": [
            "apache-airflow",
            "apache-airflow-providers-common-sql",
            "apache-airflow-providers-sqlite",
            "apache-airflow-providers-ftp",
            "apache-airflow-providers-http",
            "apache-airflow-providers-imap",
        ],
    },
)

However it's difficult to keep this list updated, as it needs to include both direct and transitive dependencies. For example apache-airflow-providers-common-io appeared in the locked requirements for one of my clients, and that broke install with a surprising error message.

It would be better to write "airflow": ["apache-airflow-providers-*"] so that this is robust to whatever providers are installed. https://github.com/aspect-build/rules_js/blob/main/docs/npm_translate_lock.md#list_patches is an example of a similar repo rule in JS-land which supports globs. Note that bazel-lib provides the starlark glob implementation used there.

FYI @arrdem

Sep 26 '24 19:09 alexeagle

Sounds reasonable to me. @aignas -- any concerns/objections?

Sep 26 '24 22:09 rickeylev

I could possibly send a PR for it if it's accepted and no one else gets to it.

Sep 27 '24 00:09 alexeagle

Accepting * only as the first or last character in the glob sounds sensible - that way we can have checks of startswith and endswith because Starlark does not support regexes. Feel free to submit a PR. :)

Sep 27 '24 04:09 aignas

Seems reasonable -- this would cover most of the cause we've had to adjust our cycle configs.

Oct 03 '24 17:10 arrdem

@aignas we can use https://github.com/bazel-contrib/bazel-lib/blob/main/docs/glob_match.md if we want to support all globs

Oct 04 '24 15:10 alexeagle

Thanks for the suggestion.

I am not sure that adding an extra dependency to the ruleset for this is worth it. Just supporting prefixes and suffixes is enough for the first iteration. Phillip had a good way to remove the package cycles in his work and in the long run using that would be better.

In the long term, I think we should not have the users specify anything and rules_python should just handre the cycles itself in some way.

On 5 October 2024 00:25:38 GMT+09:00, Alex Eagle @.***> wrote:

@aignas we can use https://github.com/bazel-contrib/bazel-lib/blob/main/docs/glob_match.md if we want to support all globs

-- Reply to this email directly or view it on GitHub: https://github.com/bazelbuild/rules_python/issues/2255#issuecomment-2393956488 You are receiving this because you were mentioned.

Message ID: @.***>

Oct 05 '24 03:10 aignas

Does this not cover what we need? In the longer term I probably agree with Ignas, although how we handle Python dependencies would have to be very different to make that work.

A quick and dirty glob implementation

def glob_matches(pat, val):
  starts_with_glob = False
  ends_with_glob = False
  if pat[0] == "*":
    starts_with_glob = True
    pat = pat[1:]
  if pat[-1] == "*":
    ends_with_glob = True
    pat = pat[:-1]

  pat = pat.split("*")

  cur = 0
  for n, part in enumerate(pat):
    new_cur = val.find(part, cur)

    if n == 0 and not starts_with_glob:
      if new_cur != 0:
        return False

    elif n == len(pat) and not ends_with_glob:
      return pat[cur:] == part

    elif new_cur == -1:
      return False

    cur = new_cur

  return True

assert not glob_matches("*bar", "foo")
assert not glob_matches("bar*", "foo")
assert glob_matches("*bar", "bar")
assert glob_matches("*bar", "foobar")
assert glob_matches("bar", "bar*")
assert glob_matches("bar*", "barfoo")
assert glob_matches("foo*bar", "foobar")
assert glob_matches("foo*bar", "foo baz quux bar")

Oct 05 '24 03:10 arrdem

For now I am happy with any implementation as long as it does not bring extra deps to the ruleset. So supporting other globs is fine as well.

Oct 07 '24 02:10 aignas

Allow glob expression in experimental_requirement_cycles