translators icon indicating copy to clipboard operation
translators copied to clipboard

ENH: Recognize static site generated blogs

Open HaoZeke opened this issue 1 year ago • 5 comments

Currently, for most pages and posts, Zotero assumes a webpage is the content type. This results in incorrect item types, for the many alternate generators of blog posts. I myself use Hugo, but other SSGs (static site generators) like Jekyll are also popular. I am currently forcing the meta tag with Wordpress in my theme but this is silly. It would make more sense for Zotero connector to accept a wider range of generator contents.

This adds some of the more popular ones.

HaoZeke avatar Aug 23 '23 00:08 HaoZeke

Related forum discussion: https://forums.zotero.org/discussion/107147/handle-other-blog-post-sources#latest

HaoZeke avatar Aug 23 '23 03:08 HaoZeke

Thanks!

I wonder what the detected type is for https://www.docsy.dev/docs/get-started/basic-configuration/ with this patch. This is an example of a Hugo-generated page that's not a blog post. If the current EM translator code is able to figure this fact out, it won't fallback to blogPost. But if there's too little metadata info otherwise, the fallback may take effect and the result won't be accurate.

zoe-translates avatar Aug 23 '23 04:08 zoe-translates

Thanks!

I wonder what the detected type is for https://www.docsy.dev/docs/get-started/basic-configuration/ with this patch. This is an example of a Hugo-generated page that's not a blog post.

Yup, it would be marked as a "blog post" under this patch.

If the current EM translator code is able to figure this fact out, it won't fallback to blogPost. But if there's too little metadata info otherwise, the fallback may take effect and the result won't be accurate.

That makes sense. Given that SSGs are often used for documentation, perhaps then a "Blog" generator could be supported (and documented)? That way SSG users can simply override their meta tag to have "blog" in the generator instance.

If that sounds good I'll go ahead and update to check if "blog" is somewhere in the tag instead.

HaoZeke avatar Aug 23 '23 04:08 HaoZeke

It would be better to look for some telltale signs of blog posts (bylines, dates, etc.) in combination with the generators. This isn't something authors should have to know about.

dstillman avatar Aug 23 '23 04:08 dstillman

Schema.org structured data can be used to identify blog posts and blogs, but I feel that only a small minority of websites use this approach.

zoe-translates avatar Aug 23 '23 06:08 zoe-translates