Allow glob expression in experimental_requirement_cycles
Currently pip_parse supports a feature to "fix" cycles among third-party packages, for example:
pip_parse(
...
experimental_requirement_cycles = {
"airflow": [
"apache-airflow",
"apache-airflow-providers-common-sql",
"apache-airflow-providers-sqlite",
"apache-airflow-providers-ftp",
"apache-airflow-providers-http",
"apache-airflow-providers-imap",
],
},
)
However it's difficult to keep this list updated, as it needs to include both direct and transitive dependencies. For example apache-airflow-providers-common-io appeared in the locked requirements for one of my clients, and that broke install with a surprising error message.
It would be better to write "airflow": ["apache-airflow-providers-*"] so that this is robust to whatever providers are installed. https://github.com/aspect-build/rules_js/blob/main/docs/npm_translate_lock.md#list_patches is an example of a similar repo rule in JS-land which supports globs. Note that bazel-lib provides the starlark glob implementation used there.
FYI @arrdem
Sounds reasonable to me. @aignas -- any concerns/objections?
I could possibly send a PR for it if it's accepted and no one else gets to it.
Accepting * only as the first or last character in the glob sounds sensible - that way we can have checks of startswith and endswith because Starlark does not support regexes. Feel free to submit a PR. :)
Seems reasonable -- this would cover most of the cause we've had to adjust our cycle configs.
@aignas we can use https://github.com/bazel-contrib/bazel-lib/blob/main/docs/glob_match.md if we want to support all globs
Thanks for the suggestion.
I am not sure that adding an extra dependency to the ruleset for this is worth it. Just supporting prefixes and suffixes is enough for the first iteration. Phillip had a good way to remove the package cycles in his work and in the long run using that would be better.
In the long term, I think we should not have the users specify anything and rules_python should just handre the cycles itself in some way.
On 5 October 2024 00:25:38 GMT+09:00, Alex Eagle @.***> wrote:
@aignas we can use https://github.com/bazel-contrib/bazel-lib/blob/main/docs/glob_match.md if we want to support all globs
-- Reply to this email directly or view it on GitHub: https://github.com/bazelbuild/rules_python/issues/2255#issuecomment-2393956488 You are receiving this because you were mentioned.
Message ID: @.***>
Does this not cover what we need? In the longer term I probably agree with Ignas, although how we handle Python dependencies would have to be very different to make that work.
A quick and dirty glob implementation
def glob_matches(pat, val):
starts_with_glob = False
ends_with_glob = False
if pat[0] == "*":
starts_with_glob = True
pat = pat[1:]
if pat[-1] == "*":
ends_with_glob = True
pat = pat[:-1]
pat = pat.split("*")
cur = 0
for n, part in enumerate(pat):
new_cur = val.find(part, cur)
if n == 0 and not starts_with_glob:
if new_cur != 0:
return False
elif n == len(pat) and not ends_with_glob:
return pat[cur:] == part
elif new_cur == -1:
return False
cur = new_cur
return True
assert not glob_matches("*bar", "foo")
assert not glob_matches("bar*", "foo")
assert glob_matches("*bar", "bar")
assert glob_matches("*bar", "foobar")
assert glob_matches("bar", "bar*")
assert glob_matches("bar*", "barfoo")
assert glob_matches("foo*bar", "foobar")
assert glob_matches("foo*bar", "foo baz quux bar")
For now I am happy with any implementation as long as it does not bring extra deps to the ruleset. So supporting other globs is fine as well.