projects icon indicating copy to clipboard operation
projects copied to clipboard

Remove dependency installation from project configs

Open rmitsch opened this issue 2 years ago • 17 comments

Goals

Remove dependency installation from project config files.

The changes to install requirements per project in the scheduled BuildKitten job can be found in https://github.com/explosion/buildkite-pipelines/pull/9.

Description

  • Remove dependency installation steps/commands from project config files.
  • Add removal of project dependencies in GH CI after test suite for project has finished.

Types of change

Refactoring.

Checklist

  • [x] I confirm that I have the right to submit this contribution under the project's MIT license.
  • [x] I ran the tests, and all new and existing tests passed.
  • [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

rmitsch avatar Jul 20 '22 09:07 rmitsch

Sofie probably wants to weigh in on this in general.

I'm not sure why the CI isn't triggered, though? Let me try closing and reopening...

adrianeboyd avatar Jul 20 '22 09:07 adrianeboyd

So the Azure CI runs succeed, but the BuildKite runs don't. I think this is due to some version mismatches that are a consequence of not being able to properly uninstall all packages on the BuildKite machine because of missing permissions. So two possibilities here:

  • I try to patch this particular issue up and hope it was the last one.
  • I set up an alternative version with a separate venv for each project.

I'm in favor of the latter, since it's easier to maintain. If the extra installation time is that horrid (or it breaks on Windows), we can still go back to the progressive uninstall/install way of setting the projects up. What do you think?

rmitsch avatar Jul 22 '22 09:07 rmitsch

The underlying reason is that I made a change in spacy v3.2.0 that broke the compatibility (which I shouldn't have), so if you get spacy v3.1 and spacy-experimental than you can get this error.

I think the solution is make sure spacy-experimental is uninstalled? Are you sure the install step is working correctly? You can see traces of it here:

https://buildkite.com/explosion-ai/spacy-projects/builds/77#0182251c-b49d-4f78-af86-afb7a9ee4e3d/664-1017

adrianeboyd avatar Jul 22 '22 10:07 adrianeboyd

Are you sure the install step is working correctly?

IMO the install step is working fine, it's the pip uninstall that's failing. Looking into it.

rmitsch avatar Jul 22 '22 11:07 rmitsch

I meant uninstall, sorry.

adrianeboyd avatar Jul 22 '22 11:07 adrianeboyd

I meant uninstall, sorry.

Using a venv on BuildKite and a fix for the spaCy re-installation w.r.t. the project config files seems to have resolved all issues.

rmitsch avatar Jul 22 '22 13:07 rmitsch

This is looking pretty good! I like that fact that it's now just one script to maintain.

adrianeboyd avatar Jul 25 '22 06:07 adrianeboyd

How is this going to affect users running existing projects end-to-end? They need to remember to install the requirements first?

svlandeg avatar Jul 25 '22 15:07 svlandeg

I don't think we've decided for sure what to do here. This shows how a good plan for how we could manage the requirements for our CI/tests to make sure that the requirements are also maintained, since having a bunch of them in azure-pipelines.yml wasn't a particularly good way to do it.

For users I could imagine:

  • an automatic step in spacy project run that uses importlib to check whether the requirements are installed if requirements.txt is present and provides example install commands for users if not
  • an automatic script that adds/maintains install steps within the projects that we maintain here
  • ???

adrianeboyd avatar Jul 25 '22 15:07 adrianeboyd

I don't think we've decided for sure what to do here. ...

I like the idea of automatically checking if the requirements are installed + providing install commands or even having users confirm is the requirements should be installed automatically. I think we could do that before running any step though (shouldn't take too long to do so?), not as a separate step.

rmitsch avatar Jul 26 '22 07:07 rmitsch

I think we could do that before running any step though (shouldn't take too long to do so?), not as a separate step.

I'm not sure I fully understand what you mean, like if we do it before running other steps, it becomes a separate step, no? Don't mean to be pedantic, just confused 😛

I definitely agree that this solution feels better for the CI/test suite of things, but I want to make sure things don't unnecessarily break for users. We don't really version the projects or anything, but workflows that checkout a given project and run some kind of all command will suddenly break if the requirements aren't installed anymore as part of the workflow. So at the very least we need to think about how to communicate this clearly.

Do we want something like assets so we'd have

spacy project install
spacy project assets
spacy project run ...

or such?

* an automatic step in `spacy project run` that uses `importlib` to check whether the requirements are installed if `requirements.txt` is present and provides example install commands for users if not

I like that idea too, but it might generate some unnecessary overhead if this is run on every single spacy project run ... command?

svlandeg avatar Jul 26 '22 13:07 svlandeg

I think we could do that before running any step though (shouldn't take too long to do so?), not as a separate step.

I'm not sure I fully understand what you mean, like if we do it before running other steps, it becomes a separate step, no? Don't mean to be pedantic, just confused stuck_out_tongue

Yeah, I misunderstood Adriane - I meant exactly what she's saying in her first suggestion :smile: We could draft something and run some tests to see how much overhead it really is? I feel like this is the safest and lowest-effort (for users) solution.

If this doesn't work out for whatever reason, I'm in favor of spacy project install. Feels appropriate.

rmitsch avatar Jul 26 '22 13:07 rmitsch

My feeling is that the overhead probably isn't that high, we should profile it a bit for sample projects. Doing all the requirements.txt parsing is somewhat challenging, but the machinery is all there somewhere.

I might vote for spacy project requirements instead of spacy project install.

adrianeboyd avatar Jul 26 '22 14:07 adrianeboyd

I guess we want to try adding the requirements checking/automatic install (or, alternatively, spacy project requirements/install) to spaCy before we merge this PR then?

rmitsch avatar Jul 26 '22 14:07 rmitsch

Alright, so checking whether just the packages in requirements.txt are installed seems to be not too much work and achievable with only pkg_resources. Getting the complete dependency tree is a bit more involved.

What we are looking for here, ideally, is something like pip check to ensure all packages are installed with the right versions. This works with conda as well. Haven't tried poetry or other package managers yet. Small hiccup: installing spaCy from scratch in a new conda environment and then running pip check yields:

thinc 8.0.15 has requirement pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4, but you have pydantic 1.9.0.
spacy 3.3.1 has requirement pydantic!=1.8,!=1.8.1,<1.9.0,>=1.7.4, but you have pydantic 1.9.0.

So I guess we have to be somewhat lenient about version mismatches. In general I think the combination of

  • checking if all packages in requirements.txt are installed using pkg_utils and
  • ensure dependency tree correctness/completeness with pip check

might be a low-effort and relatively robust way, based only on standard tools, to check if everything is installed properly.

What do you think?

rmitsch avatar Jul 27 '22 11:07 rmitsch

I would only check for the packages in requirements.txt and ignore the full dependency check.

It doesn't look like the dependencies in the conda packages are incorrect (it should be <1.9.0 for that version), so I'm not sure what's going on there.

adrianeboyd avatar Jul 27 '22 11:07 adrianeboyd

Suggestion for dependency checking: https://github.com/explosion/spaCy/pull/11226. This doesn't install dependencies yet, just detects missing or mismatched packages. Only with an automatic install we'd be fully backwards compatible, but I feel like that's kind of intrusive for a default behavior...

rmitsch avatar Jul 27 '22 13:07 rmitsch

This should be restored to its original state before I skillfully erased all of it. I'll trigger the all tests on BuildKite - once these succeed, I'll update again asking for a quick once-over review.

rmitsch avatar Oct 07 '22 11:10 rmitsch

BuildKite run succeeds. A final review/check appreciated, but there's no hurry since we don't wanna merge this before spaCy 3.5 is released anyway.

rmitsch avatar Oct 07 '22 14:10 rmitsch

It might also make sense to change the spacy requirements in the CI.

To >=3.4?

rmitsch avatar Oct 24 '22 09:10 rmitsch

Why is this change necessary?

Not doing that results in

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy-experimental 0.5.0 requires spacy<3.4.0,>=3.3.0, but you have spacy 3.4.2 which is incompatible.

rmitsch avatar Oct 24 '22 09:10 rmitsch

Ah, then I would change it to:

spacy-experimental>=0.2.0,<0.7.0

adrianeboyd avatar Oct 24 '22 10:10 adrianeboyd

Can you rerun the tests with v3.4.2? It might also make sense to change the spacy requirements in the CI?

All tests succeed on BuildKite. What should we change the spacy requirement to?

rmitsch avatar Oct 25 '22 14:10 rmitsch

@adrianeboyd @svlandeg This is ready to go IMO (a last sanity check might not hurt). The dependency check was already included in spaCy v3.4.3, so I think there's no need to wait for v3.5 to merge this.

rmitsch avatar Dec 28 '22 11:12 rmitsch

The benchmarks/nel project seems to have unrelated changes?

adrianeboyd avatar Jan 11 '23 13:01 adrianeboyd

The benchmarks/nel project seems to have unrelated changes?

That's true, but these are just formatting. Do you mind leaving them in? It's no issue to revert those either.

rmitsch avatar Jan 11 '23 13:01 rmitsch

@adrianeboyd Do you feel the open issues have been addressed?

rmitsch avatar Jan 19 '23 10:01 rmitsch

In this PR adding a correct spacy_version in nel_emerson would be okay, but if you're going to update the project code instead (which is fine!), please move these changes to a separate PR, and also add the correct spacy_version requirements.

adrianeboyd avatar Jan 26 '23 09:01 adrianeboyd

In this PR adding a correct spacy_version in nel_emerson would be okay, but if you're going to update the project code instead (which is fine!), please move these changes to a separate PR, and also add the correct spacy_version requirements.

Was about to write a comment in this regard. In order for the nel_emerson tests to succeed, we can either

  • limit spacy_version to < 3.5 or
  • update KnowledgeBase to InMemoryLookupKB and limit spacy_version to >= 3.5 or
  • add qualified imports everywhere - something like:
try:
    # In case of >= 3.5
    from spacy.kb import InMemoryLookupKB as KnowledgeBase
except ModuleNotFoundError:
    # In case of < 3.5
    from spacy.kb import KnowledgeBase

Inclined towards the first option. What do you think?

rmitsch avatar Jan 26 '23 09:01 rmitsch

I realize that I don't think we highlighted how breaking the KnowledgeBase changes were in all the release notes.

For this PR, I think adding spacy_version: ">=3.0.0,<3.5.0" is fine. Any other options would be better as a separate PR (that would be merged before this one).

adrianeboyd avatar Jan 26 '23 10:01 adrianeboyd