projects
projects copied to clipboard
Remove dependency installation from project configs
Goals
Remove dependency installation from project config files.
The changes to install requirements per project in the scheduled BuildKitten job can be found in https://github.com/explosion/buildkite-pipelines/pull/9.
Description
- Remove dependency installation steps/commands from project config files.
- Add removal of project dependencies in GH CI after test suite for project has finished.
Types of change
Refactoring.
Checklist
- [x] I confirm that I have the right to submit this contribution under the project's MIT license.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
Sofie probably wants to weigh in on this in general.
I'm not sure why the CI isn't triggered, though? Let me try closing and reopening...
So the Azure CI runs succeed, but the BuildKite runs don't. I think this is due to some version mismatches that are a consequence of not being able to properly uninstall all packages on the BuildKite machine because of missing permissions. So two possibilities here:
- I try to patch this particular issue up and hope it was the last one.
- I set up an alternative version with a separate venv for each project.
I'm in favor of the latter, since it's easier to maintain. If the extra installation time is that horrid (or it breaks on Windows), we can still go back to the progressive uninstall/install way of setting the projects up. What do you think?
The underlying reason is that I made a change in spacy v3.2.0 that broke the compatibility (which I shouldn't have), so if you get spacy v3.1 and spacy-experimental
than you can get this error.
I think the solution is make sure spacy-experimental
is uninstalled? Are you sure the install step is working correctly? You can see traces of it here:
https://buildkite.com/explosion-ai/spacy-projects/builds/77#0182251c-b49d-4f78-af86-afb7a9ee4e3d/664-1017
Are you sure the install step is working correctly?
IMO the install step is working fine, it's the pip uninstall
that's failing. Looking into it.
I meant uninstall
, sorry.
I meant
uninstall
, sorry.
Using a venv on BuildKite and a fix for the spaCy re-installation w.r.t. the project config files seems to have resolved all issues.
This is looking pretty good! I like that fact that it's now just one script to maintain.
How is this going to affect users running existing projects end-to-end? They need to remember to install the requirements first?
I don't think we've decided for sure what to do here. This shows how a good plan for how we could manage the requirements for our CI/tests to make sure that the requirements are also maintained, since having a bunch of them in azure-pipelines.yml
wasn't a particularly good way to do it.
For users I could imagine:
- an automatic step in
spacy project run
that usesimportlib
to check whether the requirements are installed ifrequirements.txt
is present and provides example install commands for users if not - an automatic script that adds/maintains install steps within the projects that we maintain here
- ???
I don't think we've decided for sure what to do here. ...
I like the idea of automatically checking if the requirements are installed + providing install commands or even having users confirm is the requirements should be installed automatically. I think we could do that before running any step though (shouldn't take too long to do so?), not as a separate step.
I think we could do that before running any step though (shouldn't take too long to do so?), not as a separate step.
I'm not sure I fully understand what you mean, like if we do it before running other steps, it becomes a separate step, no? Don't mean to be pedantic, just confused 😛
I definitely agree that this solution feels better for the CI/test suite of things, but I want to make sure things don't unnecessarily break for users. We don't really version the projects or anything, but workflows that checkout a given project and run some kind of all
command will suddenly break if the requirements aren't installed anymore as part of the workflow. So at the very least we need to think about how to communicate this clearly.
Do we want something like assets
so we'd have
spacy project install
spacy project assets
spacy project run ...
or such?
* an automatic step in `spacy project run` that uses `importlib` to check whether the requirements are installed if `requirements.txt` is present and provides example install commands for users if not
I like that idea too, but it might generate some unnecessary overhead if this is run on every single spacy project run ...
command?
I think we could do that before running any step though (shouldn't take too long to do so?), not as a separate step.
I'm not sure I fully understand what you mean, like if we do it before running other steps, it becomes a separate step, no? Don't mean to be pedantic, just confused stuck_out_tongue
Yeah, I misunderstood Adriane - I meant exactly what she's saying in her first suggestion :smile: We could draft something and run some tests to see how much overhead it really is? I feel like this is the safest and lowest-effort (for users) solution.
If this doesn't work out for whatever reason, I'm in favor of spacy project install
. Feels appropriate.
My feeling is that the overhead probably isn't that high, we should profile it a bit for sample projects. Doing all the requirements.txt
parsing is somewhat challenging, but the machinery is all there somewhere.
I might vote for spacy project requirements
instead of spacy project install
.
I guess we want to try adding the requirements checking/automatic install (or, alternatively, spacy project requirements/install
) to spaCy before we merge this PR then?
Alright, so checking whether just the packages in requirements.txt
are installed seems to be not too much work and achievable with only pkg_resources
. Getting the complete dependency tree is a bit more involved.
What we are looking for here, ideally, is something like pip check
to ensure all packages are installed with the right versions. This works with conda
as well. Haven't tried poetry
or other package managers yet. Small hiccup: installing spaCy from scratch in a new conda
environment and then running pip check
yields:
thinc 8.0.15 has requirement pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4, but you have pydantic 1.9.0.
spacy 3.3.1 has requirement pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4, but you have pydantic 1.9.0.
So I guess we have to be somewhat lenient about version mismatches. In general I think the combination of
- checking if all packages in
requirements.txt
are installed usingpkg_utils
and - ensure dependency tree correctness/completeness with
pip check
might be a low-effort and relatively robust way, based only on standard tools, to check if everything is installed properly.
What do you think?
I would only check for the packages in requirements.txt
and ignore the full dependency check.
It doesn't look like the dependencies in the conda packages are incorrect (it should be <1.9.0
for that version), so I'm not sure what's going on there.
Suggestion for dependency checking: https://github.com/explosion/spaCy/pull/11226. This doesn't install dependencies yet, just detects missing or mismatched packages. Only with an automatic install we'd be fully backwards compatible, but I feel like that's kind of intrusive for a default behavior...
This should be restored to its original state before I skillfully erased all of it. I'll trigger the all
tests on BuildKite - once these succeed, I'll update again asking for a quick once-over review.
BuildKite run succeeds. A final review/check appreciated, but there's no hurry since we don't wanna merge this before spaCy 3.5 is released anyway.
It might also make sense to change the spacy requirements in the CI.
To >=3.4
?
Why is this change necessary?
Not doing that results in
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy-experimental 0.5.0 requires spacy<3.4.0,>=3.3.0, but you have spacy 3.4.2 which is incompatible.
Ah, then I would change it to:
spacy-experimental>=0.2.0,<0.7.0
Can you rerun the tests with v3.4.2? It might also make sense to change the spacy requirements in the CI?
All tests succeed on BuildKite. What should we change the spacy
requirement to?
@adrianeboyd @svlandeg This is ready to go IMO (a last sanity check might not hurt). The dependency check was already included in spaCy v3.4.3, so I think there's no need to wait for v3.5 to merge this.
The benchmarks/nel
project seems to have unrelated changes?
The benchmarks/nel project seems to have unrelated changes?
That's true, but these are just formatting. Do you mind leaving them in? It's no issue to revert those either.
@adrianeboyd Do you feel the open issues have been addressed?
In this PR adding a correct spacy_version
in nel_emerson
would be okay, but if you're going to update the project code instead (which is fine!), please move these changes to a separate PR, and also add the correct spacy_version
requirements.
In this PR adding a correct spacy_version in nel_emerson would be okay, but if you're going to update the project code instead (which is fine!), please move these changes to a separate PR, and also add the correct spacy_version requirements.
Was about to write a comment in this regard. In order for the nel_emerson
tests to succeed, we can either
- limit
spacy_version
to < 3.5 or - update
KnowledgeBase
toInMemoryLookupKB
and limitspacy_version
to >= 3.5 or - add qualified imports everywhere - something like:
try:
# In case of >= 3.5
from spacy.kb import InMemoryLookupKB as KnowledgeBase
except ModuleNotFoundError:
# In case of < 3.5
from spacy.kb import KnowledgeBase
Inclined towards the first option. What do you think?
I realize that I don't think we highlighted how breaking the KnowledgeBase
changes were in all the release notes.
For this PR, I think adding spacy_version: ">=3.0.0,<3.5.0"
is fine. Any other options would be better as a separate PR (that would be merged before this one).