translators
translators copied to clipboard
ENH: Recognize static site generated blogs
Currently, for most pages and posts, Zotero assumes a webpage is the content type. This results in incorrect item types, for the many alternate generators of blog posts. I myself use Hugo, but other SSGs (static site generators) like Jekyll are also popular. I am currently forcing the meta tag with Wordpress in my theme but this is silly. It would make more sense for Zotero connector to accept a wider range of generator contents.
This adds some of the more popular ones.
Related forum discussion: https://forums.zotero.org/discussion/107147/handle-other-blog-post-sources#latest
Thanks!
I wonder what the detected type is for https://www.docsy.dev/docs/get-started/basic-configuration/ with this patch. This is an example of a Hugo-generated page that's not a blog post. If the current EM translator code is able to figure this fact out, it won't fallback to blogPost. But if there's too little metadata info otherwise, the fallback may take effect and the result won't be accurate.
Thanks!
I wonder what the detected type is for https://www.docsy.dev/docs/get-started/basic-configuration/ with this patch. This is an example of a Hugo-generated page that's not a blog post.
Yup, it would be marked as a "blog post" under this patch.
If the current EM translator code is able to figure this fact out, it won't fallback to blogPost. But if there's too little metadata info otherwise, the fallback may take effect and the result won't be accurate.
That makes sense. Given that SSGs are often used for documentation, perhaps then a "Blog" generator could be supported (and documented)? That way SSG users can simply override their meta tag to have "blog" in the generator instance.
If that sounds good I'll go ahead and update to check if "blog" is somewhere in the tag instead.
It would be better to look for some telltale signs of blog posts (bylines, dates, etc.) in combination with the generators. This isn't something authors should have to know about.
Schema.org structured data can be used to identify blog posts and blogs, but I feel that only a small minority of websites use this approach.