spaCy icon indicating copy to clipboard operation
spaCy copied to clipboard

Add visualisations for parsed documents

Open richardpaulhudson opened this issue 3 years ago • 1 comments

Description

This will add three visualisations of information from parsed documents:

  • a table with rows for consecutive tokens in the document; columns are feature values and/or dependency trees
  • a text interspersed with specific feature labels for individual tokens
  • a table containing rows for tokens in the document with specific features or feature values; optionally rows for tokens before and after matching tokens can be rendered as well; columns are feature values

All three visualisations are heavily configurable as to colours, spacings etc.

Remaining TODOs are:

  • Enable the specification of parts of a document to render
  • Enable config-file based configuration of the three visualisations
  • Develop a standard default configuration for each visualisation
  • Integrate the functionality into the CLI

Types of change

It is a group of new features.

Checklist

  • [x] I confirm that I have the right to submit this contribution under the project's MIT license.
  • [ ] I ran the tests, and all new and existing tests passed.
  • [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.

richardpaulhudson avatar Dec 23 '21 15:12 richardpaulhudson

One difficulty with this PR is that the proliferation of ANSI control characters and carriage returns mean that the tests are mostly not human-readable. They can however be easily understood by adding a print() statement to each test to display the output and running pytest with an appropriate option like -rP to display standard output.

It would be non-standard, but I wonder whether it might even make sense to retain the print()s in the repository?

richardpaulhudson avatar Mar 25 '22 06:03 richardpaulhudson

I think text visualizations will be a great addition!

It feels like this should fit under displacy rather than as a method on Doc, but I'm not sure about the API/naming.

adrianeboyd avatar Jan 31 '23 09:01 adrianeboyd

Closing for now & adding to our internal backlog

svlandeg avatar Jun 15 '23 22:06 svlandeg