python-frontmatter icon indicating copy to clipboard operation
python-frontmatter copied to clipboard

Allow loading and parsing multiple posts

Open gaconnet opened this issue 8 years ago • 6 comments

Hi. It's nice to see a frontmatter library written in python. Thanks for writing it!

How do you feel about supporting a way to load & parse multiple posts in one go, or perhaps even in a streaming fashion via an iterator or coroutine?

As motivation, consider a single markdown file that you would like to transform into a sequence of <section> tags to insert into reveal.js, and you want your transformation pipeline to transform metadata attributes into html data attributes for things like custom slide transitions:

# python-frontmatter
an introduction

---
transition: zoom

---
load and parse files (or just text) with YAML front matter.

---
transition: concave
background: linear-gradient(45deg, #f06, yellow)

---
now with streams of documents!

I built a little standalone parser for this on my own, but I thought it might be nice if this cool library did it.

gaconnet avatar Oct 27 '15 22:10 gaconnet

I think the way I'd do this is with multiple files, or by splitting text and parsing strings with frontmatter.loads. We do this with a lot for @frontlinepbs projects, usually with metalsmyth (which needs a better name) and Tarbell.

eyeseast avatar Oct 27 '15 22:10 eyeseast

I agree that splitting it before it comes into frontmatter would be a fine way to go, but it seems unfortunate that such a splitter would need to duplicate some of the parsing work of frontmatter and would also need to be configured separately if either frontmatter or the splitter were to ever support custom delimiters (such as in gray-matter).

I think that having a simple interface to parse a stream of posts opens many interesting opportunities. For example, a non-programmer uses prose to edit a single file that goes into GitHub or a Gist and then a post-commit hook transforms the single file into a multi-page slideshow. In addition to parsing a single file as a stream, a streaming parser enables diverse command-line invocations such as parse-frontmatter prologue.md - epilogue.md and collect-interesting-files | parse-frontmatter (both examples fictional; assume that the fictional binaries both do something interesting).

I'm happy to have the splitter be a separate tool though. I just wanted to point out these opportunities here. I'll also give metalsmyth a try.

If you're still not sold on the idea then feel fee to close this issue whenever you feel the time is right. :)

gaconnet avatar Oct 28 '15 22:10 gaconnet

Just ran into a situation that matches exactly this approach, so I'm going to reopen and reconsider.

eyeseast avatar May 10 '16 14:05 eyeseast

I'm in a similar situation (not online though) where I want to migrate from jekyll to blogdown and I want to change a couple of metadata attributes for all posts:

#!/usr/bin/env python

import os
from pathlib import Path
import datetime
import frontmatter

posts_root = os.environ['HOME'] / Path('dev/brainblog/content/post')

for post in posts_root.iterdir():
    fname_date = post.name[0:10] # capture the "2018-02-08" "timestamp" from the post filename
    tstamp = datetime.datetime.strptime(fname_date, "%Y-%m-%d").timestamp()
    utc_time = datetime.datetime.utcfromtimestamp(tstamp)
    utc_string = utc_time.strftime("%Y-%m-%dT%H:%M:%S.%f+00:00 (UTC)")
    with post.open() as f:
        post = frontmatter.load(f)
        if post.get('date') is not None:
            post.__setitem__('date', utc_string)
            post.__setitem__('modified', utc_string)
            frontmatter.dump(post, f)
            #print(post.metadata)

But apparently I cannot frontmatter.dump against the same post/filehandle:

Traceback (most recent call last):
  File "/Users/romanvg/bin/markdown_datetime.py", line 20, in <module>
    frontmatter.dump(post, f)
  File "/Users/romanvg/.miniconda/lib/python3.5/site-packages/frontmatter/__init__.py", line 155, in dump
    fd.write(content.encode(encoding))
TypeError: write() argument must be str, not bytes

How would you change such metadata attributes and serialize them "in-place"?

brainstorm avatar Feb 10 '18 22:02 brainstorm

Nevermind, just opened/closed the object with different modes. Thanks for your lib! ;)

    with post.open('r') as f:
        post_fm = frontmatter.load(f)
        if not post_fm.get('date'):
            post_fm.__setitem__('date', utc_string)
            post_fm.__setitem__('modified', utc_string)
            post_str = frontmatter.dumps(post_fm)
            f.close()

            with post.open('w') as f:
                f.write(post_str)

brainstorm avatar Feb 10 '18 23:02 brainstorm