harp
harp copied to clipboard
Support metadata on markdown files and pages
The layout files and partials can use some inline metadata about the current page being viewed. For example, a title attribute from an article. I guess, Jekyll syntax could be used. Something like: https://raw.github.com/egeozcan/egeozcan.github.com/master/_posts/2012-02-07-by-the-way.md
...which would be accessible by current.data.title. Wouldn't it be great?
Hey @egeozcan, thanks very much for opening an issue. We’ve definitely discussed this, and something similar was also brought up in #45, so there is interest.
I think you already know this, but just so we’re on the same page, here’s how I’d try what you’re after. You could bring in JB/setup
as a partial on that blog post’s _layout.jade
or _layout.ejs
. That would be in the same folder as your posts, along with a _data.json
file, which could look like this:
{
"by-the-way": {
"category": "about",
"tags": ["intro", "javascript"],
"date": "2012-02-07"
}
}
The _data.json
approach is definitely different than Jekyll’s, which can be really useful: It centralise related metadata. Front matter interfering with this benefit would be my biggest concern. I don’t think @sintaxi or I were that enthusiastic about supporting YAML inside Markdown, though I did think @ryanfitzer’s MultiMarkdown suggesting was interesting. But then if Markdown supports front matter, but I write some fancier blog posts in Jade or EJS, do they then have to support front matter, too?
Hopefully I’m not sounding negative—that’s just why it hasn’t been done thus far. It’s really great to hear other perspectives on how people think this should work, so if you have more thoughts or examples of how you’d like to use it, that’d be really helpful. Thanks!
Thanks for the clear response @kennethormandy, and no, you don't sound negative at all. I totally understand how you made this decision and I'd like to point to some problems with this approach and offer a solution.
Problems
- When using source control, merge errors occur when a lot of people edit json files, such as the closing brackets being parsed as the same line. This problem could theoretically be mitigated by using a json-aware diff tool but I've yet to find any.
- Scalability. When the number of files grow, it becomes harder and harder to maintain the _data.json.
- When trying to use the data from the layout, such as using the title in the page header, there's no way to easily access the contents of the _data.json. (I guess this can be solved by just merging the parsed data to the current object though)
Solution
Prepending the data to the files, as my initial suggestion, would break the compatibility of the files with their parsers when used outside harp, and is not a good practice overall for separation of concerns. It also could be problematic to parse if the same annotations are used in a theoretical future markup syntax.
It seems to me that the best solution would be to allow json and yaml files, prefixed with an underscore and named the same as the document that they'll attach, to be appended to the current object when rendering.
Example:
//by-the-way.json
{
"title": "By the way",
"category": "about",
"tags": ["intro", "javascript"],
"date": "2012-02-07"
}
So that I should be able to do this in layout:
<html>
<head>
<title><%- current.data.title %> | My Awesome Harp Based Blog</title>
</head>
<body>
<%- yield %>
</body>
</html>
Would this be too hard?
The _data.json approach is definitely different than Jekyll’s, which can be really useful
One of the big reasons I like the front matter approach (regardless of syntax) is the reduced friction when creating content. Having to create/update 2 separate files isn't ideal.
While I do like the ability to define meta outside of the content file, it would be in cases when that meta doesn't add any valuable context/meaning to the content.
But when that meta does add context/meaning, I want in the same file.
@ryanfitzer Well put. That’s definitely a strong use case for front matter, it’s come up a couple of times before. So you would like to have both in some format? Which would supersede the other if they both had a title, for example?
@egeozcan Very thorough, thank you. I definitely agree the separation of concerns. I think this is idea behind the _data.json
file already, actually. It sounds like what you’re asking for is actually already possible, although it would take place in all one file. I tend to think that editing one small thing in many files is more difficult than many small things in one file. Anyway, here’s how I do what you’re suggesting.
The App
app/
|- _harp.json
|- _layout.ejs
|- index.ejs
|+ posts/
|- _data.json
|- there-is-no-spoon.md
|- by-the-way.md
_harp.json
{
"globals": {
"title": "egeozcan",
"tagline": "My Awesome Harp Based Blog"
}
}
posts/_data.json
{
"there-is-no-spoon": {
"title": "There Is No Spoon",
"tags": ["intro", "personal"],
"date": "2012-02-07"
},
"by-the-way": {
"title": "By The Way",
"tags": ["intro", "javascript"],
"date": "2012-02-07"
}
}
_layout.ejs
<!DOCTYPE>
<html>
<head>
<title><%= title %> | <%= tagline %></title>
</head>
<body>
<%- yield %>
</body>
</html>
index.ejs
<h1><%= title %></h1>
<ul>
<% for (var slug in public.posts.data) { %>
<% var post = public.posts.data[slug] %>
<li>
<a href="posts/<%= slug %>">
<%= post.title %>
</a>
</li>
<% } %>
</ul>
@kennethormandy The <title>
tag in @egeozcan example is using the post's title. In yours it's the global title.
I ran into this limitation as well.
Actually, if you’re using the latest version of Harp, it will use the current context’s title! (You can update with sudo npm update harp -g
).
So, if I’m at /posts/there-is-no-spoon, there’s a corresponding title in the _data.json
, so my title tag will be <title>There Is No Spoon | My Awesome Harp Based Blog</title>
. There is no _data.json
file in the root directory, so there is no metadata for the index page. This means title
falls back to whatever’s in the harp.json
, if there is anything. So, on the index page, the title is <title>There Is No Spoon | My Awesome Harp Based Blog</title>
You could even take this further. If I wanted to add a different tagline on the By The Way post:
_data.json
{
"there-is-no-spoon": {
"title": "There Is No Spoon",
"tags": ["intro", "personal"],
"date": "2012-02-07"
},
"by-the-way": {
"title": "By The Way",
"tags": ["intro", "javascript"],
"date": "2012-02-07",
"tagline": "This is my Harp post"
}
}
Now, one /posts/by-the-way, the title will be <title>By The Way | This is my Harp post</title>
.
@kennethormandy Nice! Glad to see the update. Thanks for pointing that out.
Which would supersede the other if they both had a title, for example?
Not sure. My sense is that the most intuitive scenario would be for content file to overwrite the json. But that's because I see the content as the most local context as far as scope. Others may see it differently.
Can you see a use case that would make a good case for the opposite?
Thanks a lot for the great examples! They'll help a lot.
I agree with @ryanfitzer about the local overwriting global. But what if we had local under the "current" object? So we could do something like:
<%= current.title || title %>
This isn't a deal-breaker for me though. Most probably also wouldn't mind if local just overwrites.
And I still think that having everything in a _data.json
file is not scalable (though I hate using that word). We definitely need individual metadata support, be it an external file or an annotation in the article/page itself.
So there are many pros and cons (both subjective and objective) to front-matter and IMHO once you add them both up the disadvantages far outweigh the advantages. Here it is as I see it.
Pros
Edit one file instead of two
This is many peoples first instinct and it has some merit for sure. If one is optimizing for efficiency (as we all should be) it seems unnecessary to have to add hello-world.md
and edit _data.json
file when it could just be adding a hello-world.md
file. This makes perfect sense until you start to see all the negative side effects of this system.
Cons
Front-matter does not give you order.
It must be pointed out that front-matter alone only gives you local variables. We still need a way to order content. With front-matter we are at the mercy of the filesystem how things are ordered and this rarely comes out as desired. There are two common work arounds for this–that in my opinion are both terrible.
1) Have a naming convention in the filename that allows you to order things.
for example, instead of:
posts/
|- _data.json
|- `hello-world.md`
|- `hello-brazil.md`
+- `hello-canada.md`
I would have:
posts/
|- `1_hello-world.md`
|- `2_hello-brazil.md`
+- `3_hello-canada.md`
Now imagine you have dozens of files and you want to add in a new file or reorder something, you would have to rename every file. what a pain in the ass this would be. Not to mention, having the URL and filename not match would be confusing. In the case of _data.json
you would just order the json object the way you want it.
2) Have a blessed property such as date in the front-matter that gives you order.
This is what Jekyll does which I think works reasonably well in a case of a blog which is one of the things that makes Jekyll "blog aware". This is one of the main reasons Jekyll becomes super awkward to use once you are building something other than a blog.
Many have felt this pain when working with Jekyll...
- http://stackoverflow.com/questions/9053066/sorted-navigation-menu-with-jekyll-and-liquid/9126294#9126294
- https://github.com/plusjade/jekyll-bootstrap/issues/42
- https://github.com/rickmanelius/drupalpcicompliance/issues/13
- http://stackoverflow.com/questions/13266369/how-to-change-the-default-order-pages-in-jekyll
Harp's _data.json
approach makes this a cinch and more importantly, there is only one mechanism that covers ordering regardless of if you are displaying blog posts or a navigation or anything else.
Front-matter is an anti-pattern.
It deserves to be mentioned, having files that are half YAML, half markup is semantically incorrect. This alone probably shouldn't be enough to not have front-matter as we all know, building great systems is all about knowing when to break the rules but it has to be seen as a drawback of this approach.
Breaking this rule causes text editors to freak out, pushes complexity onto syntax highlighters. This is not cool.
Front-matter is punishing on performance
Harp is very fast, and we want to keep it this way. One of the reasons it is so fast is it does a lot of things in parallel such as building the file system tree. During this step it walks the file system opens every _data.json
and builds the state for public
object for iterating over. Having all the metadata in _data.json
files is one of the reasons harp can do this so quickly even with large projects. Harp can do this so quickly that we rebuild this state between every request when in development mode. This is why you can edit your _harp.json
, _data.json
, or a template and simply refresh the browser to see the changes.
If we supported front-matter we would have to open every template to fetch its metadata and this would have a significant affect on performance and we would very likely run out of file descriptors on the file system which means we would have to throttle how many files we open at at time. All this impacts performance and complexity.
Its worth mentioning that Jekyll has a good reason for not making performance a priority, it works strictly as a static site generator and therefor assets are always served with a static web server. Harp on the other hand IS a static web server. It has to be fast.
YAML is bloated
Not that front-matter has to be YAML it could be JSON but I might as well address this point.
YAML spec is 80 pages long and implementations are complex and have had known security issues. JSON is sooo simple. Parsers exist everywhere, and it is secure. Not saying JSON is better than YAML, just saying its a better choice for a high performance web-server such as Harp
TL;DR
- Front-matter alone is not enough, we would also need ways to address ordering items.
- Front-matter is an anti-pattern and pushes complexity to text-editors
- Front-matter is horrible for performance.
Hope this helps with understanding the rationale behind not supporting front-matter. A lot of thought went into evaluating these tradeoffs. I see how people have become used to using front-matter since Jekyll has become such a popular tool. BTW - I don't want to come across as slagging on Jekyll, I think it has been a great tool. Harp has the luxury of hind-sight since it is a new tool. So we have the benefit of fixing the mistakes Jekyll made, one of which IMHO is front-matter.
Thanks for the detailed explanation. Could you please also comment on allowing data files per document, while keeping support for _data.json
. Like having _my-article.json
next to my-article.md
or my-article.jade
. The data in the individual files could be processed as if they were part of the _data.json
file in their directory when compiling (added under a key of their name).
Yeah sure.
Having matching _my-article.json
metadata file would still suffer from the performance issue as we would have to open n
files per directory to build our metadata object but at least in this case the performance hit is opt-in unlike front-matter where we would have to open every file regardless to see if there is metadata or not. So I suppose this idea could be entertained.
The main problem I would see with this is there would now be two ways to do the same thing which isn't much of a problem other than people may get confused when they have _data.json
file and they end up overriding the metadata with a _my-article.json
file. It could cause some confusion. Are you having a hard time with _data.json
? are you finding it difficult to maintain this file or does it feel unsavoury to you?
@sintaxi Thanks for taking the time to explain your reasoning so thoroughly.
Are you having a hard time with _data.json? are you finding it difficult to maintain this file or does it feel unsavoury to you?
In my case, migrating a blog started in 2006 with 1500+ posts makes the single json file very tedious.
It deserves to be mentioned, having files that are half YAML, half markup is semantically incorrect.
Agreed. My use case was specific to Markdown, where YAML doesn't have that problem. But I definitely agree with your point for other file types.
Front-matter is punishing on performance
Great point. Hadn't thought about it this way. The tradeoff in my situation is more friction on the user's side.
but at least in this case the performance hit is opt-in
I like thinking of these features as op-in. With @egeozcan's feature, if the files are present, Harp would use them. For the front matter feature (not that it would need to be YAML), a flag in the _data.json
could be used to dictate if a directory's content files contain the meta.
One way or another, I appreciate the thought behind how Harp is trying to balance performance.
@sintaxi yes, single _data.json
file isn't maintainable; especially when you have many articles and many people working on those articles.
We are looking to migrate our company blog to Harp.io and it's a bummer to see the _data.json
requirement. With 3-4 different authors, hundreds of articles, and dozens in the pipeline, a single file to manage them is not ideal.
Human Author's Perspective
If you are blogging with, say, Markdown, you would certainly want to write tags or title and so on in the file, not somewhere else.
Summary information like indexes are supposed to be generated, but not hand-crafted, and _data.json
is exactly such a thing.
While having _data.json
for something global is nice, being forced to do some "register"-like stuff isn't pleasant, from the author's perspective.
Front-matter is intuitive for human who is writing the content. If there is any technical problems, they should be left to machines and programmers. Let the design feel human and work for human, not the other way around.
Programmer's Perspective
Front-matter of Jekyll syntax is clear and easy to strip before parse phase. Markdown and other parsers should not see the front-matter at all.
And BTW, the Github markdown renderer is now front-matter-aware, it will recognize the front-matter and render it to a table. Why? Because many people are writing front-matter, like in Jekyll, like in middleman. I can't think of any reason that one should say no to front-matter.
Cons Aren't Really Cons
Let's see the Cons listed in @sintaxi 's post:
Front-matter alone is not enough, we would also need ways to address ordering items.
@sintaxi already presented the solutions. And the solutions are better than just fine to me.
Even if order is really that matter, and the solutions suck, order is not something specific to individual files, but something global, so it's exactly what _data.json
is supposed to do.
_data.json
is good by itself and it brought in new features, but it doesn't do what front-matter is good at.
Front-matter is an anti-pattern and pushes complexity to text-editors
Front-matter is anti-pattern? Where does this statement come from?
Half YAML, half markup is semantically incorrect ? What about Javascripts in HTML? What about markdown filters in Jade?
I think text-editors can handle that, they have handled formats even much more screwed. And one of the design purpose of highlight.js is to recognize half-half-like code, and most editors can recover from a piece of incorrect code and keep life going.
Front-matter is horrible for performance.
That's a premature assertion. I can't see why harp can't maintain a cached json file and simple check the time stamps next round, or some other measures.
And again, don't just consider the performance of machine, consider the efforts of human maintaining the horrible _data.json
, that's a much more crucial performance overhead.
TL;DR
Forgive me for being harsh, but lack of front-matter is really why I'm not moving to harp. And everything else about harp seems so great...
I know it doesn't add to the discussion but I really wanted to say that I'm patiently waiting for any news about this.
Just wanted to chime in and say that while I am really liking Harp, I really miss the easy pairing of meta and content in the same file (front-matter).
If JSON is easier/faster than YAML, then I am fine with that, but I want to write that meta once and then use template logic to make changes to things like post display order (by title, by date, by creation, etc.). The current Harp solution just feels klunky compared with how the rest of Harp works.
First I want to thank the authors for their work! You guys rock, that said I have noticed a few things in the last few days. My experience comes as a developer that wanted to convert an old neglected Jekyll blog into a self hosted & served Harp site.
In my opinion the problems that I encountered stem from the fact that Harp is entertaining competing ideas:
- Static content generation
- A fast development platform for developers using modern techniques (Less, Jade, etc.)
- A server for said content
The problem is that separately these work really well, but all together are sort of tangled as it stands.
The story for harp serve
is really great. Making UI tweaks, building out pages with Jade, etc is all very fast and feels fluid.
But then you finish developing the site and something happens: you want to create content. At this point the story gets muddy. Because when you create content you have to maintain the file: my-post.jade and the _data.json. This is a pain, especially if someone other than a developer will be creating the content.
If I had a magic lamp that granted wishes I would ask for:
-
harp serve --production
When I specify this dynamic files will be compiled once and only once on initialization, and then served from their cached locations. Without pre-compiling large assets the current production server can take almost a second to serve pages. Long by modern patience levels. - `harp post new 'My new post'`` This command creates a new file in /posts/my-new-post.(jade|md). In addition it updates the _data.json with the proper meta information. This would alleviate a lot of the where is the front-matter rage.
- Better integration of content with templates. I want to specify the title of a blog post, the most recent blog post, etc. without _data.json hackery in the template. Again an issue with lacking front-matter.
I think that ultimately Harp is a great contribution and I thank the authors for their efforts. However it does not appear that Harp is very suited to my workflow: create the site once and then add individual pieces of content over and over. Right now development on harp is great, but continued creation of additional content, ie: blog posts is not as great.
I hope my constructive feedback will be helpful. Thank you!
Thanks everyone, there’s a lot of great feedback for us here. We discuss this issue pretty frequently, and while I don’t have any specific answers, I just wanted to say this issue hasn’t been forgotten by any means. I really appreciate everyone writing about their personal experiences with Harp, it helps us make much more informed design decisions.
@jcswart I also just wanted to address your two other points:
-
I believe this can be accomplished now, it’s on the Harp server page, but we could definitely expand on it:
Harp is production ready, by specifying an environment variable we add extra LRU caching to make your site run even faster.
NODE_ENV=production harp server --port 3000
Hopefully that helps, and if not, feel free to open another issue.
- Personally, I can’t see this happening as part of the CLI. You could probably make some sort of script that could do this for you, but it’s a very blog or static site generator-centric approach. Harp is great for those things, but probably won’t have features for that single, specific use case like Jekyll does. That said, some kind of interface built upon Harp for managing content and metadata could be great, I just don’t think it will take the form of that feature, that’s all. It is helpful feedback, though, so thanks!
@jcswart @utensil @colinscroggins @holic @egeozcan @ryanfitzer
Thank you all for expressing your thoughts on this topic. All other arguments aside, at this time adding front-matter would have extremely large impact on performance especially for larger apps. Performance is very important for this project. Although at this time we are unwilling to make this compromise I have drafted a plan to implement front-matter in a way that might have a manageable performance hit. Though this would take significant changes in #sintaxi/terraform (something I would like to do anyway). I think we will table this discussion until these changes are in terraform where we can debate the pros/cons on the merits of the design and the performance compromise hopefully out of the picture. Sound good?
Its worth mentioning that Jekyll has recently added harp-style data files http://jekyllrb.com/docs/datafiles/
@jcswart regarding your "magic lamp" feature requests, nothing wrong with that idea thought that seems like the responsibility of another tool. Not harp itself.
-b
For anyone migrating, I wrote a small script to convert Jekyll post metadata to the Harp format:
https://npmjs.org/package/jekyll2harp
Not to keep this thread going, but I'd like to point out that the ordering of keys in a JSON object is not guaranteed, and in fact many intermediate representations don't preserve order. It is the case that V8 does, but it's a little funny to rely on the ordering of keys to specify post listing order.
Fantastic! nice work.
You are correct that the order of objects in "JSON" is not guaranteed. However, ordering in Harp is. If V8 for any reason changes their API in this regard we will seek alternate JSON parsing methods to ensure the behaviour in Harp does not change.
Thanks for writing this library. Should be a great resource for people coming to harp from jekyll.
I would like to kindly ask if there are any news about this. Is there any way we can contribute?
Just to address the performance concern, and if this topic is still being considered, one approach may be to honor front matter only during a compile down to flat files.
Performance seems like a strange reason to not implement front matter. Caching should solve this problem pretty easily, especially since you would only need to cache when you're done editing the content, i.e. production.
I see front matter as a compliment to _data.json
. At the moment, _data.json
overrides values in _harp.json
. Wouldn't it make sense to have front matter override values in _data.json
?
I can't speak to performance issues one way or another, but from a usability perspective, I find the ability to use front matter as a much more maintainable and friendly way to page meta data.
Another vote here; just ruled out Harp as an option for my company web site because of the lack of front matters.
+1 for frontmatter. For me it's also the single reason why I can't use harp.
Another +1 for front matter - It is the exclusive reason that I'm on Jekyll for my blog, etc. I use harp for some web apps, but it currently doesn't provide nice blog posts, etc with front matter.