spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Simplify and clarify enable/disable behavior of spacy.load()

Open rmitsch opened this issue 3 years ago • 3 comments

Description

As brought up in https://github.com/explosion/spaCy/issues/11443, the current behavior of spacy.load() w.r.t. enable, disable and the enabled/disabled options in the config is opaque and inconvenient to work around in case of a conflict. This PR

  • extends the error message to be informative,
  • lets enable/disable values passed to spacy.load() take precedence over config values, i.e. config values are overwritten if function arguments are passed, and
  • adds a warning message in case this (discarding config values because function arguments were passed) happens, since this can have unforeseen consequences for other pipeline components.

Types of change

Enhancement (maybe fix?).

Checklist

  • [x] I confirm that I have the right to submit this contribution under the project's MIT license.
  • [x] I ran the tests, and all new and existing tests passed.
  • [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

rmitsch avatar Sep 08 '22 11:09 rmitsch

Hm, it's not great, especially...

I agree, but is it really more correct than checking if it's an empty SimpleFrozenList, as all empty instances should be identical? I don't mind doing it this way, I'm just wondering. The only alternative I can think of is changing the default param to None, which is messy.

rmitsch avatar Sep 14 '22 15:09 rmitsch

Hm, it's not great, especially...

I agree, but is it really more correct than checking if it's an empty SimpleFrozenList, as all empty instances should be identical? I don't mind doing it this way, I'm just wondering. The only alternative I can think of is changing the default param to None, which is messy.

Are all empty instances identical? I mean, what if a user specifically calls the function with an empty SimpleFrozenList - couldn't we tell the difference?

Apart from that argument, there's also a worry of maintainability - if we ever do change the default to None (which I don't advocate for right now), then we might not notice that 200 lines further down, an implicit dependency has broken. But this will be more obvious & easier to keep in-sync when we store this default in a var.

svlandeg avatar Sep 14 '22 20:09 svlandeg

Apart from that argument, there's also a worry of maintainability...

Yeah, I agree.

Are all empty instances identical? I mean, what if a user specifically calls the function with an empty SimpleFrozenList - couldn't we tell the difference?

Not via equality, but via identity. The latest commit includes a suggestion for how we could do this, based on your idea of a private global variable and id(). Since this default value DEFAULT_PIPES_STATUS is used in several modules, it's currently not private though, which I'm not fond of. Let me know what you think (name is also up for debate).

rmitsch avatar Sep 15 '22 09:09 rmitsch