[FR] simplify gazelle_python.yaml
🚀 feature request
Relevant Rules
gazelle_python.yaml
Description
A clear and concise description of the problem or missing capability...gazelle_python.yaml will list all libs on packages
matplotlib.backend_tools: matplotlib
matplotlib.backends: matplotlib
matplotlib.backends.backend_agg: matplotlib
matplotlib.backends.backend_cairo: matplotlib
matplotlib.backends.backend_gtk3: matplotlib
matplotlib.backends.backend_gtk3agg: matplotlib
matplotlib.backends.backend_gtk3cairo: matplotlib
matplotlib.backends.backend_gtk4: matplotlib
matplotlib.backends.backend_gtk4agg: matplotlib
matplotlib.backends.backend_gtk4cairo: matplotlib
matplotlib.backends.backend_macosx: matplotlib
matplotlib.backends.backend_mixed: matplotlib
matplotlib.backends.backend_nbagg: matplotlib
matplotlib.backends.backend_pdf: matplotlib
matplotlib.backends.backend_pgf: matplotlib
...
I have a three lines only requirements.in
numpy
requests
seaborn
20 packages requirements.txt
but received a 110KB gazelle_python.yaml which has 2517 keys at .manifest.modules_mapping
It is too big.
Describe the solution you'd like
If you have a solution in mind, please describe it.Can we simplify it and make gazelle_python.yaml have only top level package name?
In my shallow cognition, such a long imports seems meaningless.
I'd like to support it if I can.
Describe alternatives you've considered
Have you considered any alternative solutions or workarounds?no
I am not sure if the complexity is worth it as the package imports may overlap at the top level. It could be possible to improve it, but I am not sure how realistic it is.
Let's imagine a case where you write code
import foo.bar
and the foo.bar import is only provided by foo-bar-plugin python package, but what you have in the gazelle_python.yaml is data that tells gazelle that foo maps to foo python package, so gazelle will be happy and will add the wrong dependency instead of erroring here.
I think this is related to #1877, but might not be.
If somebody was to implement it, I think it should:
- Do the right thing in the case outlined above and I am not sure what that right thing is yet.
- Handle cases where we have platform specific imports.
- The smarter
gazelleplugin that can work with the smallergazelle_python.yamlshould still work with the previousgazelle_python.yaml.
One possibility is that we only simplify the generated yaml. During the generation, we find the top bifurcation and then cut the branches. And this is well compatible.
for example
foo: foo
foo.bar : for-bar
foo.bar.abc: for-bar
foo.bar.xyz: for-bar
foo.bar.xyz.sub: for-bar
to
foo: foo
foo.bar : for-bar
One extra idea how we could simplify the gazelle_python.yaml file by a lot is to only include the dependencies that the project is referencing directly. For example, if my project has a pyproject.toml file and I only specify airflow as the dependency in the project, gazelle_python.yaml will be enormous because of all of the transitive airflow dependencies that rules_python knows about. However, the only things that we should actually include in the gazelle_python.yaml are the things that the requirements.in or pyproject.toml mentions.
Such thing could be achieved in two ways:
- Modify the
pipextension to return a list of whls that are the direct references of the project. This would mean that thepip.parseshould accept therequirements.inor thepyproject.tomland use that to generate a list ofwhltargets that are the direct references. That way themodules_mappingfile will only creategazelle_python.yamlentries that are the direct references. - Modify the gazelle tooling to accept a
pyproject.tomland/orrequirements.infile in order to filter out the unnecessary wheels.
I personally think that 1. might be the better solution, but a little bit more involved, because we could also set different visibility values in the hub repo for transitive and non-transitive dependencies of the project to enforce that all of the direct dependencies of the project are in the requirements.in and/or pyproject.toml. However, this is something that may be a matter of taste and I am not sure if this is a good thing to enforce on everyone.
The second approach would be more doable as golang could be definitely taught how to parse pyproject.toml and that would be something that the user can opt into.
That way we may be able to cut the file size by a lot without doing any more advanced techniques like what was suggested in the comment above.
One extra idea how we could simplify the
gazelle_python.yamlfile by a lot is to only include the dependencies that the project is referencing directly. For example, if my project has apyproject.tomlfile and I only specifyairflowas the dependency in the project,gazelle_python.yamlwill be enormous because of all of the transitiveairflowdependencies thatrules_pythonknows about. However, the only things that we should actually include in thegazelle_python.yamlare the things that therequirements.inorpyproject.tomlmentions.我们如何大幅简化gazelle_python.yaml文件的一个额外想法是仅包含项目直接引用的依赖项。例如,如果我的项目有一个pyproject.toml文件,并且我只指定airflow作为项目中的依赖项,则gazelle_python.yaml将非常庞大,因为所有传递性airflowrules_python知道的依赖项。但是,我们实际上应该在gazelle_python.yaml中包含的唯一内容是requirements.in或pyproject.toml提到的内容。Such thing could be achieved in two ways:这样的事情可以通过两种方式实现:
- Modify the
pipextension to return a list of whls that are the direct references of the project. This would mean that thepip.parseshould accept therequirements.inor thepyproject.tomland use that to generate a list ofwhltargets that are the direct references. That way themodules_mappingfile will only creategazelle_python.yamlentries that are the direct references.修改pip扩展以返回作为项目直接引用的whls列表。这意味着pip.parse应该接受requirements.in或pyproject.toml并使用它来生成whl目标列表,这些目标是直接参考。这样,modules_mapping文件将仅创建作为直接引用的gazelle_python.yaml条目。- Modify the gazelle tooling to accept a
pyproject.tomland/orrequirements.infile in order to filter out the unnecessary wheels.修改瞪羚工具以接受pyproject.toml和/或requirements.in文件,以过滤掉不必要的轮子。I personally think that 1. might be the better solution, but a little bit more involved, because we could also set different
visibilityvalues in thehubrepo for transitive and non-transitive dependencies of the project to enforce that all of the direct dependencies of the project are in therequirements.inand/orpyproject.toml. However, this is something that may be a matter of taste and I am not sure if this is a good thing to enforce on everyone.我个人认为 1. 可能是更好的解决方案,但涉及更多一点,因为我们还可以在hub存储库中为传递和非传递依赖项设置不同的visibility值项目的所有直接依赖项都位于requirements.in和/或pyproject.toml中。然而,这可能是一个品味问题,我不确定这对每个人来说是否是一件好事。The second approach would be more doable as golang could be definitely taught how to parse
pyproject.tomland that would be something that the user can opt into.第二种方法更可行,因为 golang 绝对可以被教导如何解析pyproject.toml并且用户可以选择这样做。That way we may be able to cut the file size by a lot without doing any more advanced techniques like what was suggested in the comment above.这样我们就可以大幅削减文件大小,而无需使用任何更高级的技术,例如上面评论中建议的技术。
Maybe I missed something.
Whether it is 1 or 2, users need to ensure that all direct dependencies need to live in requirements.in, otherwise we will not be able to resolve this dependency. This requirement is reasonable, but it is difficult to achieve (most projects should not be able to do it because python does not have the same visibility restrictions as bzlmod).
WDYT @aignas
I think my previous suggestion about requirements.in/pyproject.toml may have been indeed not worth the effort.
I personally do not mind large files since they are machine generated and consumed and you can add the gazelle_python.yaml to .gitattributes for GH (or other git UI providers) to consider that file machine-generated and hide it from PRs.
If you really want to try out the option of reducing the size of it, please keep in mind #1971 as having more assumptions/business logic in this file could hinder the solution of that issue.
I have thought a little more about this and it may be the proposed would work well in cases to reduce merge conflicts. The smaller and simpler the file, the better.
@hunshcn, let me know if you have still time and willingness to add a new manifest format that people could choose. And if you decide to create a PR, feel free to add me as a reviewer.
This is good news. I will do this work later.