spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Replace EntityRuler with SpanRuler implementation

Open adrianeboyd opened this issue 3 years ago • 4 comments

Description

Remove EntityRuler and rename the SpanRuler-based future_entity_ruler to entity_ruler.

Main changes:

  • It is no longer possible to load patterns on init as with EntityRuler(patterns=).
  • The older serialization formats (patterns.jsonl) are no longer supported and the related tests are removed.
  • The config settings are only stored in the config, not in the serialized component (in particular the phrase_matcher_attr and overwrite settings).

Types of change

?

Checklist

  • [x] I confirm that I have the right to submit this contribution under the project's MIT license.
  • [x] I ran the tests, and all new and existing tests passed.
  • [x] My changes don't require a change to the documentation, or if they do, I've added all required information.

adrianeboyd avatar Aug 16 '22 15:08 adrianeboyd

I'm not really sure how to handle the docs. You could have /api/entityruler forward to /api/spanruler, but it's hard to cover both configs and options in a non-confusing way on the span ruler page.

Instead I removed the class tag and source and added the warning at the top.

adrianeboyd avatar Aug 16 '22 15:08 adrianeboyd

I'm not really sure how to handle the docs. You could have /api/entityruler forward to /api/spanruler, but it's hard to cover both configs and options in a non-confusing way on the span ruler page.

Instead I removed the class tag and source and added the warning at the top.

I was also going to suggest to just keep both pages separate, highlighting the reduced options on entityruler and then refer to spanruler to emphasize that that actually has more options/features.

svlandeg avatar Aug 17 '22 15:08 svlandeg

A remaining question is how to deal with references to EntityRuler (with the class name) in the docs since the class no longer exists.

adrianeboyd avatar Sep 07 '22 07:09 adrianeboyd

A remaining question is how to deal with references to EntityRuler (with the class name) in the docs since the class no longer exists.

I did a quick search and it looks like pretty much all references on other pages in the docs ( _architecture.md, processing-pipelines.md etc) can be updated to refer to the new SpanRuler / api/spanruler instead. Did you have any specific problematic cases in mind?

svlandeg avatar Sep 13 '22 16:09 svlandeg

Adriane, is there anything I can do to help on this PR?

svlandeg avatar Oct 03 '22 14:10 svlandeg

I think it's confusing to switch the docs entirely to SpanRuler in these cases because you do have both entity_ruler and span_ruler and many people will be expecting to find docs about the entity ruler in general. I also don't think switching to "entity ruler" or entity_ruler makes sense because it doesn't line up with the rest of the docs, so in general I'm just still not really sure exactly what to do.

adrianeboyd avatar Oct 04 '22 11:10 adrianeboyd

I think we'll nevertheless want to switch to entity_ruler in some of the docs. Actually, in general, we might consider using the component name more often than the actual class name, as since v3 the classes shouldn't be used as much directly anymore.

I'm going through some of these cases and will commit a version of what I think makes sense - then you can review.

svlandeg avatar Oct 04 '22 15:10 svlandeg