spaCy
spaCy copied to clipboard
Add visualisations for parsed documents
Description
This will add three visualisations of information from parsed documents:
- a table with rows for consecutive tokens in the document; columns are feature values and/or dependency trees
- a text interspersed with specific feature labels for individual tokens
- a table containing rows for tokens in the document with specific features or feature values; optionally rows for tokens before and after matching tokens can be rendered as well; columns are feature values
All three visualisations are heavily configurable as to colours, spacings etc.
Remaining TODOs are:
- Enable the specification of parts of a document to render
- Enable config-file based configuration of the three visualisations
- Develop a standard default configuration for each visualisation
- Integrate the functionality into the CLI
Types of change
It is a group of new features.
Checklist
- [x] I confirm that I have the right to submit this contribution under the project's MIT license.
- [ ] I ran the tests, and all new and existing tests passed.
- [ ] My changes don't require a change to the documentation, or if they do, I've added all required information.
One difficulty with this PR is that the proliferation of ANSI control characters and carriage returns mean that the tests are mostly not human-readable. They can however be easily understood by adding a print()
statement to each test to display the output and running pytest
with an appropriate option like -rP
to display standard output.
It would be non-standard, but I wonder whether it might even make sense to retain the print()
s in the repository?
I think text visualizations will be a great addition!
It feels like this should fit under displacy
rather than as a method on Doc
, but I'm not sure about the API/naming.
Closing for now & adding to our internal backlog