panflute icon indicating copy to clipboard operation
panflute copied to clipboard

Pre and Post Filters on Autofilter functionality

Open ekiim opened this issue 3 years ago • 8 comments

Hi guys, currently I'm running several filters and I had the need to create my own metadata value named panflute-filters-pre, what I do with that variable is to receive a list of filters that and prepend it to the panflute-filters variable, currently it runs with another filter that does that, but I was thinking that we can introduce this in to panflute's functionality.

Having panflute-filters-pre and panflute-filters-post.

If you give me a green light on this, I can write it and submit the PR.

ekiim avatar May 01 '21 00:05 ekiim

I might have missed something, but isn't it already doable as is?

  1. filters are run in order, i.e. the 1st filter in the list runs first, effectively the "pre"-filter
  2. filter can mutate the doc object, effectively allows you to modify panflute-filters metadata

Did it not work if you try to solve the problem like this? If so, may be the auto-filter function is "copying" the panflute-filters list and doesn't respect the update? May be the solution is to make it does, not to create 2 more keys?

ickc avatar May 04 '21 23:05 ickc

The way that I'm working with this right now is by having a pandoc command as pandoc -F pre-panflute.py -F panflute, what the pre-panflute.py does is to run the following function.

def finalize(doc):
    filters = doc.get_metadata("panflute-filters", [])
    prefilters = doc.get_metadata("panflute-filters-pre", [])
    postfilters = doc.get_metadata("panflute-filters-post", [])
    filters = [*prefilters, *filters, *postfilters]
    doc.metadata["panflute-filters"] = filters

This allows for panflute (that gets executed as the second filter), to have available a list of filters to autorun as normally, but allowing me to have beforehand an order for the filters.

Why am I doing this.

I'm using pandoc and panflute as my static site builders, (with a couple of makefile), and I have some filters (that run at the end) that apply to all, and some that apply to specific pages, so the approach I'm taking right now, is to configure the global filters at a metadata.yaml file, putting what I need at the panflute-filters-post or pre, and leaving the panflute-filters variable definition in the document to include what ever filters I need for that particular document.

But still if you had a different case, like a filter that finds and includes local file references (like lists of links and such), you could not worry about does, and leave the link normalizing step to a post filter, for instance inserting local references to markdown files and when output is html having a filter that runs post, and converting the links to .html instead of .md.

The main reason I'm proposing this is that I don't see it hurting, and it's just an additive change.

ekiim avatar May 05 '21 00:05 ekiim

My first question would be if it works to put pre-panflute.py as the first (and only) panflute-filters?

And the reaction is that it seems it shouldn’t be in panflute by default. Just like pandoc that the default metadata/args options aren’t the most general but is specific to single document generation in mind.

But funny you should mention about static site generator as I was just thinking about in the last 2 hours if I should build a static site generator centering around pandoc and panflute. I have been using a few site generators including a custom make file for simple cases but I find them lacking and not “native-pandoc” enough for me. If I write one it would be a Python solution with some other dependencies including some way to have a make-like dependency generating capability and automatic parallelization (basically the advantage of my make workflow but addressing some limitations there and borrow some concepts from other site generators.)

ickc avatar May 05 '21 00:05 ickc

No, because once the autofilter starts running, you have a defined queue of filters that is already in place, changing the value of panflute-filter variable would not take any effect on the filter queue that we have currently running.

I'm currently polishing my project, I might upload it to GitHub during the weekend.

ekiim avatar May 05 '21 00:05 ekiim

Right, that was what I meant in the beginning: shouldn't we change this behavior, that panflute-filters can be mutated, rather than adding 2 more keys?

ickc avatar May 05 '21 01:05 ickc

Ok, It sounds doable, but we would need to change the way autofilter is doing the filter lookup and filter execution, also, we could make a filter that calls himself infinitely like

    filters = doc.get_metadata("panflute-filters", [])
    filters = [*filters, "this_filter.py"]
    doc.metadata["panflute-filters"] = filters

And I guess this is not desirable. I've looked on how docutils work, and they do something like what you mentioned.

ekiim avatar May 05 '21 02:05 ekiim

I think more about it and am now thinking may be the mutating panflute-filters may not be a good direction. My summary of the situation:

  • the problem is about how to specify panflute-filters through different ways, and how they should be resolved regarding their order. In your case you have some "local" panflute-filters list and "global" panflute-filters list.
  • your current solution (pandoc -F pre-panflute.py -F panflute) requires an additional filter. So, any solution that requires you writing another filter doesn't exactly solve your problem. i.e. even if panflute-filters is dynamically resolved, it doesn't fully solved your problem.
    • By the way, in your pre-panflute.py, if you call autofilter.stdio directly, you can avoid calling 2 filters in pandoc to avoid converting the ASTs twice. i.e. you don't need to modify the metadata at all if you're writing a filter to perform this.

About resolving local and global panflute-filters list,

  • The thing I worry most in adding a solution to panflute directly would be its generality. The most general case is the 2 lists merging in arbitrary insertion order, where yours is a special case of this. (i.e. per list it only specifies what orders they must be executed within each, but there's no information to which one from the local has to be run first, even before the global ones, etc.)
  • But may be it is not that important. May be practically the global ones should either be before those in YAML or after. In this case, we are asking what are the reasonable approach to pass global option to pandoc filter (where pandoc doesn't allow passing command line arg), and the most straight forward solution would be env. var. Would adding two env var having names similar to panflute-filters-pre and panflute-filters-post satisfies you?

By the way, just so I am not misunderstanding your setup. Your metadata.yaml is prepended to your markdown file so that pandoc is resolving the 2 metadata blocks (one from YAML, one from the block in markdown.) Correct? i.e. pandoc metadata.yaml some-file.md ...?

ickc avatar May 05 '21 03:05 ickc

The extra metadata files are added the following way

pandoc \
   -F panflute-pre.py \
   -F panflute \
   --metadata-file=metadata_1.yaml \
   --metadata-file=metadata_2.yaml \
   --metadata input_file="${FILE}" \
   -i "${FILE}"

In order to make my static site work, I inject some values calculated by my Makefile, but in general terms, the idea is that we have several metadata files, one per directory,

This allows to have one global file that has metadata like your Google Analytics ID, and then you can have in your template file the placeholder for it, and that's it, also it allows you to include CSS at a global scale.

This metadata files, get overwritten, from left to right, meaning that if

  • ${FILE} has a value for css
  • metadata_1.yaml has a value for css
  • metadata_2.yaml has a value for css

Then the value that takes wins is the one on metadata_2.yaml. Now as you pass the value via, CLI, then the CLI one takes precedence.

In reality, I'm using more than 2 metadata files plus the metadata block on the document itself.

By the way, in your pre-panflute.py, if you call autofilter.stdio directly, you can avoid calling 2 filters in pandoc to avoid converting the ASTs twice. i.e. you don't need to modify the metadata at all if you're writing a filter to perform this.

Good point, didn't think about that.

ekiim avatar May 05 '21 04:05 ekiim