jupytext icon indicating copy to clipboard operation
jupytext copied to clipboard

Integration with / support for pyspelling

Open psychemedia opened this issue 4 years ago • 2 comments

Noting various recipes for using jupytext with things like black to improve code quality, has anyone got any recipes for using jupytext with the pyspelling package that wraps aspell and hunspell for improving markdown text quality (and maybe also code comment quality)?

I have a hacky pyspelling example config that parses through jupytext-md files on a path, ignoring code content and the YAML header (and from the regex, potentially other things — this is just a quick POC:-( and running a spellcheck (via a custom whitelist wordlist) on the markdown:

matrix:
- name: Markdown
  aspell:
    lang: en
  dictionary:
    wordlists:
    - .wordlist.txt
    encoding: utf-8
  pipeline:
  - pyspelling.filters.context:
      context_visible_first: true
      delimiters:
        # ignore YAML header in jupytext-md
        - open: '(?s)^(?P<open> *-{3,})$'
          close: '^(?P=open)$'
  # ignore URLs
  - pyspelling.filters.url:
  # Route md via HTML
  - pyspelling.filters.markdown:
      markdown_extensions:
        - pymdownx.superfences:
  - pyspelling.filters.html:
      comments: false
      # Ignore code
      ignores:
        - code
        - pre
        - tt
  sources:
    - 'content/*/.md/*.md'
    #- '**/.md/*.md'
  default_encoding: utf-8

This sort of step might also be useful in Jupytext workflows intended to produce jupyter-book content? [@choldgraf ?]

Currently, I don't think pyspelling offers a way to automate fixes (eg lookups of known / likely typos with corrected versions) which would be more in keeping with a "text linter", it just produces reports. But it is a handy tool for the quality toolbox if you are creating a lot of text content.

psychemedia avatar Mar 16 '21 20:03 psychemedia

Hi Tony, thanks for reaching out! Yes I saw your post yesterday and found it very useful!

A few thoughts that come to my mind:

  • maybe you'd like to pipe a dedicated version of the notebook with no metadata to pyspelling. If you are able to call pyspelling from Python then maybe you could use something like this:
fmt = {'extension': 'md', 'notebook_metadata_filter': '-all', 'cell_metadata_filter': '-all'}
text = jupytext.writes(nb, fmt=fmt)
# and then pyspelling
  • what you've done on .ipynb notebooks would also work on .md notebooks (use jupytext.read instead of noformat.read)
  • of course it would be great to have integration in Jupyter Lab but that requires much more work for sure!

Also I see that you are using manual pre-commit scripts, recently we worked on using Jupytext with the pre-commit framework. The post about that is still a work-in-progress, but tested examples are provided in the tests, see e.g. this one: https://github.com/mwouts/jupytext/blob/master/tests/test_pre_commit_3_sync_black_nbstripout.py

mwouts avatar Mar 18 '21 08:03 mwouts

Hi Marc

Thanks for that tip with the pre-commit jupytext/black demo; being able to think through and implement processing pipelines arising from a commit could be really interesting.

Re: pyspelling: even though it's just a POC, the .ipynb pyspelling filter makes things much easier to manage in terms of writing pyspelling pipelines I think because of the clean separation of md and code. Using Jupytext to put content into .ipynb and then process that seems at first glance to be a more robust strategy than trying to filter hybrid code and markdown representations.

psychemedia avatar Mar 18 '21 09:03 psychemedia