hugo
hugo copied to clipboard
Pages from data, take 5
There are existing issues about this, but I prefer to start fresh when I have new ideas on a subject. I thought about this again when having my hair washed at the hairdresser today. Maybe the hair massage helped.
I think I have been too hung up in the technical challenges of this (remote adapters, how to effectively do partial updates etc.), making the whole issue too big to start with. What we have talked about earlier have also been "something different on the side of what we already have".
But what we have is:
- A virtual
/content
directory that can be composed via Hugo Modules (with overrides on file level) - Virtual mounts support (you can mount any directory or file, even from remote GitHub repos into
/content
) - A front matter based metadata model with cascade keyword etc.
- Partial server updates based on filesystem events.
- ...
With that I mind, I thought about adding a new reserved filename in /content
starting with _content
.
Given the example below:
-
_content.json
,_content.toml
(and YAML) would be fairly straight forward, i.e. metadata + content. We should probably support a tree structure somehow, so you can build a complete content structure from one root_content
file. We should probably support multiple files per directory so you could do_content_products.json
etc. -
_content.go
would represent the dynamic content, some kind of content adapter, possibly remote. This is obviously the area with most open questions (_content.js
would be a thought), but having the naming in place is a start.
content
├── _content.json
├── _index.md
├── blog
│ ├── _content.toml
│ ├── _index.md
│ ├── image.png
│ └── post1.md
└── docs
├── _content.go
└── _index.md
/cc @regisphilibert @onedrawingperday @digitalcraftsman @budparr @moorereason @kaushalmodi and gang.
Yes. This is clearer than using the section's _index.md
.
We should probably support multiple files per directory so you could do _content_products.json etc.
If we need to create another file, might as well create the directory: /products/_content.json
If we need to create another file, might as well create the directory: /products/_content.json
Yes, probably.
Also thinking, I think I'm going to restrict this to JSON
in its first iteration, as that is the only format supporting stream decoding.
Also thinking, I think I'm going to restrict this to JSON in its first iteration
And possibly YAML:
https://github.com/go-yaml/yaml/issues/4
That's great. Now I'm seeing those _content.yaml
file with lots of information about the meta data. Which key goes where in a "page" object.
But I was also under the impression that Hugo would fetch the data based on some parameters (endpoints, pagination, etc...) which might have been a bit optimistic.
With _content.go
? Does this mean, we'll be able to write our own data fetcher/parser in Go or Javascript and if so, can't the metadata/front_matter be addressed from there?
With _content.go?
So, there is 2 stories to this issue.
The main story being that we ned to break this down into smaller pieces to be able to grasp it and possibly also implement it in iterations.
So:
-
_content.yaml
is raw page data just wrapped in a more "data like" format than a markdown file with frontmatter. The big benefit being that you can create thousands of pages in one file. -
_content.go
would be the "create those thousands of articles by some kind custom scripting towards an Hugo API" (which would handle all the caching/partial update logic etc). -
_content.wordpress
(and now I'm just making stuff up) would use a built-in Hugo adapter to pull in those thousand articles.
Got it! Thanks for clarifying.
Oh and, if all Hugo need from a _content.go
is to produce/return an array of items, we could use Go Template and return
.
That would allows us to use some partialCached
(for transformers) and other familiar Hugo stuff to prep the data grabbed from GetJSON
or else...
I have done some experiments with @natefinch 's https://github.com/starlight-go/starlight today (his library wraps https://github.com/google/starlark-go/ by Google), and I'm very impressed and think it would be a good fit for the above (and also other uses).
It will be yet another thing to learn for Hugo users (it's a Python dialect), but I think well worth it.
It integrates very well with the Go side of the fence. It returns all the variable definitions when evaluating a script, even functions, so it should be possible to define "plugin interfaces" with default implementations, and implement whatever needed in the _content.py
file, e.g:
type interface DataGetter {
GetDataStream() io.ReadCloser
}
Would be implemented in _content.py
as:
def GetDataStream():
return http.Get(site.Param("wordPressAPI"))
See my previous comments, and push the appropriate button below whether you think adding Starlark (a Python dialect) as a scripting language in Go is a good idea or not. First as a way for users to write custom "source adapters" for content, but we will most llkely find other use cases, eventually (@natefinch did a PR some time ago with custom template functions in Python).
EDIT: Those votes came in fast ... Note that if you push the "no, that is a bad idea", it would be good if you could elaborate in a comment. What would be a good alternative etc.?
I think the learning curve of Go Template is hard enough for many new users, letting them know that learning Python is a requisite for Data source sounds a bit harsh.
I'd be willing to invest time into learning some Go, but Python, not so excited. Did you drop JS because of speed?
I'd be willing to invest time into learning some Go, but Python, not so excited. Did you drop JS because of speed?
The options that I have evaluated are Python and Lua. To my knowledge there are no solid and embedded JS implementation in Go. But I assume some day it will happen.
And for my own curiosity, why not Go?
And for my own curiosity, why not Go?
Mostly security related.
I think the learning curve of Go Template is hard enough for many new users, letting them know that learning Python is a requisite for Data source sounds a bit harsh.
+1
Mostly security related.
Note that that remark was about "compiled Go" (not Go templates).
I trust you in choosing the fastest most reliable way of letting coders build their own data parsers.
So I'll mention Go Template one last time.
With returning partials, merge
, `transform.Unmarshal etc..., Hugo's got really better at handling data.
With getJSON (might need improvements and complementing methods), Scratch and the new features mentioned above, I know I can build that parser with Hugo's Go Template. Is it a bad idea?
The main problem/challenge with using Go templates for this is that it's procedural, it's "one script per file", one method.
I will try to think of a better example, but in the context we're talking about (plugins), it becomes hard/ugly to then create plugin APIs with life cycle methods, e.g.:
def PluginType():
return "source"
def ShouldUpdate(after):
return true
def GetDataStream():
return http.Get(site.Param("wordPressAPI"))
In the above, Hugo could look at the plugin and say "Oh, it supports JSON via a reader (stream), we can optimize for that".
With Go templates, parts of the above may look like:
{{ if .LastUpdate.After ... }}
{{ .Result.NotModified }}
{{ else }}
{{ .Result.Set getJSON "foo" }}
{{ end }}
Note again that I'm not saying that the above represents a "real plugin interface", but I'm fairly sure that most real plugin scenarios would require some level of "branching". And it would be good if we could write those plugins in something that doesn't look like code from the 80s.
Also note that Starlark is a Python dialect, a sub-set of Python built for this particular purpose (and as an embedded scripting option it is, in my eyes, done very well -- supporting both Go's garbage collection and multithreading).
Starlark is a small and simple language with a familiar and highly readable syntax. You can use it as an expressive notation for structured data, defining functions to eliminate repetition, or you can use it to add scripting capabilities to an existing application.
For me, my Python skills are just about on par with my JavaScript skills, but I still think I would prefer Python for the use cases above if I were a Python newbie. I only think JS would make sense if you also bring in all the (in)sanity of NPM (but we really need for it to somehow integrate with the Go side of the fence).
Also, being able to define these plugin interfaces (as proper interfaces), we can also provide a set of implementations in Go which you then can configure from your plugin, e.g. (and again, I'm just quickly making this stuff up):
def SourcePlugin():
# One of "many" supported adapters with implementation in Go.
return "wordpress"
def PluginConfig():
return site.Param("myWordPressConfig")
The above is, I think, valid Python. Most editors will provide syntax highlighting for it (if you suffix the file "*.py").
So, the above could be rewritten to:
{{ .SetSourcePlugin "wordpress" }}
{{ .SetPluginConfig .Site.Params.myWordPressConfig }}
Which isn't bad, but since there is no way for Hugo to look at the file _content.tpl
and know what it is/needs, you end up sending in the full plugin API as the "dot context" and you get some level of spaghetti when putting it all together.
I will also add that, if you think the above is hard and you still want/need to use it (people have happily lived without the above for a long time), Hugo Modules allows people to borrow from other people's work. This will be even more true if we extend this to writing plugins that gets exposed as template functions (see @natefinch 's PR).
As one last note: We should be able to acccess all of Hugo's template functions inside these scripts, .e.g resources.Get "foo.jpg"
.
Most editors will provide syntax highlighting for it (if you suffix the file "*.py") and is, in my head, much easier to explain to people than some Go template conditional spaghetti.
yes, yes! Let Go Templates do the templating ... and other interfaces for the more ambiguous and likely-to-branch-and-to-error data generation. Since Go is 1) a security risk and 2) rather esoteric then Python and JS seem like robust and highly used options (with Python being particularly well crafted and good for beginners). That being said, there would be some aesthetic nicety of having an "all-GO-based" solution .
I’m convinced and cast my vote. :). Can’t wait to see some parser example. I guess I wanted an excuse to learn Go.
I realize this is a config language.
This might be a an example that Hugo users can more easily relate to:
https://github.com/bazelbuild/starlark/blob/master/README.md#tour
This is pretty exciting actually. Give so much control on a directory’s data, remote or otherwise
I guess I wanted an excuse to learn Go.
Still plenty of reasons to learn Go ...
For completeness, my previous answer about "Go as plugin" assumed some kind of "compile on the fly" and communication via "os/exec". Go has a "plugin package" built-in that could have worked for us, but it's Linux and macOS only and considered very experimental and buggy (and no-one have worked on it for years).
The options that I have evaluated are Python and Lua.
Lua is commonly used for such things, e.g. Pandoc uses it for custom filters, but considering that Starlark is implemented in Go and looks as 'python-lite', I believe it's a better option than full-fledged Lua being easy and/or simple-enough.
To my knowledge there are no solid and embedded JS implementation in Go.
I won't cry because of that. :-)
Is this issue regarding pages from data still relevant? The discussion seems to have stalled.
Anyway, I used https://github.com/avillafiorita/jekyll-datapage_gen for Jekyll site in the past and maybe it could be used for inspiration for a Hugo equivalent?
I definitely think it's still relevant.
I wrote a couple of small scripts in golang that are doing the job for me, currently, but I would love to have this feature baked-in.
Just chiming in to say I have a number of cases where having a way for Hugo to consume data via json from an external provider or a collection of json files stored as data would be hugely beneficial. Currently having to write solutions that create and commit md to a repo or script the creation of hundreds of markdown files that reference a json data file are cumbersome and undesirable complexity. If I am following this thread (if) seems like having a way to define external or data driven content sources within the current content tree and if needed have a scripting language to define your transform sounds excellent. So would like to see something like this happen.
Just chiming in to say I have a number of cases where having a way for Hugo to consume data via json from an external provider or a collection of json files stored as data would be hugely beneficial.
100% agreed. We are developing a CMS for JSON files and as a consequence have done lots of research in that area. Most static page generators are severly lacking in that aspect. While some (including Hugo) can load data files into certain predefined pages, creating dynamic pages and/or dynamic navigations is mostly somewhere between cumbersome to impossible. So I think that having good support for arbitrary structured data sources could be beneficial for Hugo.
One other project that was introduced recently is Nuxt Content which is a plugin for Nuxt.js (a framework for server-side rendered or statically generated applications). It allows loading different types of files and create content dynamically based on them. For example, you can create dynamic pages/routes and/or create a dynamic navigation. We've done a simple test project with it and it seems to be well thought out and very flexible (edit: we published a blog post about it). Maybe there is some inspiration in that project for Hugo.
https://github.com/gohugoio/hugo/issues/5074#issuecomment-717564911
I can see this thread is fairly old now but I just wanted to ask, given I kinda share the same sentiment as this old comment:
I think the learning curve of Go Template is hard enough for many new users, letting them know that learning Python is a requisite for Data source sounds a bit harsh.
I'd be willing to invest time into learning some Go, but Python, not so excited. Did you drop JS because of speed?
is it not possible to use a scripting language that is at least very similar, syntactically, to Go? I am not a security expert by any means but can the security concern/s not be mitigated somehow? if a project like https://github.com/goplus/gossa exists, what are they doing security-wise which is different? ...just for the record, I am in no way trying to discredit the starlight project, which is obviously great work!
I have a build script in python that converts json entries to files. You can find it here: https://github.com/brunoamaral/gregory/blob/main/build.py#L136-L189
Hope it helps.
As the end of year comes near, I'd like to reflect a bit on this popular ticket.
Having dealt this year with a lot of pre-build (write your own files) workarounds, I feel like this could go into an early stage using Go Template and returning partials to feed Hugo a simple list of stings or maps.
We know Hugo is already building pages without content files for taxonomy terms. And all it can work with is a string
(Museum, Food And Entertainment etc...) and it's pretty good at building a URL from that and publishing the page. I personally have be using this to build page from data with Hugo (or remote, it eases up the pre-build work: only two files to write instead hundreds of md).
What would already be a big leap to achieve the goals this ticket is aiming for is to give Hugo a simple list of entries to build pages for a given section. (akin to a simple list of taxonomy terms)
This can be as simple as a returning partial whose returned value could be fed to Hugo. It could come from a data file or a getJSON
call.
A simple slice of strings (no data structuring)
It could even be a simple lists of strings (exactly like taxonomy terms).
{{/* content/products/_content.html */}}
{{ $products := slice }}
{{ with site.Data.products }}
{{/* or with getJSON "https://myapi.com/products" */}}
{{ range . }}
{{ $products = append .Name }}
{{ end }}
{{ end }}
{{ return $products }}
And, users could build up their own logic to fetch the data for each page using a data file or else. (Like when using the taxonomy hack)
{{/* layouts/products/single */}}}
{{ with index (where site.Data.products "Name" .Title) 0 }}
<h1>{{ .Name }}</h1>
price: {{ .TaxPrice }}
{{ end }}
As a slice of maps (data structuring is supported)
We could envision a slice of maps. This would potentially take care of the data structuring. It seems the only required key would be title
(to generate the URL). All the rest could be parsed as is (reserved keys preserved, unreserved keys under would be held in the .Params
object in templates).
That's how this returning partials could look like:
{{/* content/products/_content.html */}}
{{ $products := slice
{{ with site.Data.products }}
{{ range . }}
{{ $products = $products | append (dict
"title" .Name
"price" .TaxPrice
) }}
{{ end }}
{{ end }}
{{ return $products }}
If I'm not wrong and this is not such a big leap from the code base, this would make this early stage "building pages from data" easy enough that any body could jump in and test it. The more feedback the more robust the next stages will be.
I understand plugins could be used, and yeah maybe it would be great in the long term to support heavy old dated structure like WordPress or Drupal. But do we need this from the get-go? I'm very excited about this prospect and I, personally, would love to jump into a world where Hugo can build pages from data, even if it is as limited as my suggestion can be.