hugo icon indicating copy to clipboard operation
hugo copied to clipboard

GitInfo unset when file path contains bytes > 0x80

Open slnc opened this issue 2 years ago • 21 comments

GitInfo doesn't seem to be set when a page has exactly 1 commit. I have a partial with this (simplified) code:

{{- $recent := where .Site.RegularPages.ByLastmod.Reverse "Type" "eq" "notes" }} {{- $recent_num := (.Site.Params.widgets.recent_num | default 10) }}

{{- if $recent }}

    {{- range first $recent_num $recent }} {{- if ne .Title "" }} {{ .GitInfo }} {{ $lastmod := .Lastmod.Format "Jan 2, 2006" }}
  • {{ .Title }}
    {{ $lastmod }}
  • {{- end }} {{- end }}
{{- end }}

Which prints this:

Screen Shot 2022-04-21 at 19 23 18

The file has exactly 1 commit:

Screen Shot 2022-04-21 at 19 24 04

All the other files have more than 1 commit. From Hugo's Lastmod and GitInfo doc pages I expect that pages with exactly 1 commit have GitInfo populated like pages with more than 1 commit.

My config.toml [frontmatter] config looks like this:

[frontmatter] date = ["date", "publishDate", "lastmod"] lastmod = ["lastmod", ":git", "date", "publishDate"] publishDate = ["publishDate", "date"]

What version of Hugo are you using (hugo version)?

hugo v0.97.3-078053a43d746a26aa3d48cf1ec7122ae78a9bb4 darwin/amd64 BuildDate=2022-04-18T17:22:19Z VendorInfo=gohugoio

Does this issue reproduce with the latest release?

Yes

slnc avatar Apr 21 '22 17:04 slnc

Unable to reproduce. Please provide a minimal reproducible example, or close the issue. Thanks.

jmooring avatar Apr 21 '22 18:04 jmooring

The issue title is incorrect, sorry, I have updated it. It seems like the issue is with emoji file names. How to reproduce:

git clone https://github.com/slnc/hugo9810 hugo server

Then open http://localhost:1313 and you should see this:

Screen Shot 2022-04-22 at 08 27 57

The code that prints the GitInfo object.

Perhaps a bug in github.com/bep/gitmap?

slnc avatar Apr 22 '22 06:04 slnc

I've looked a bit more into this. gitmap uses git log to build its GitRepo.Files which is what hugo/hugolib/gitinfo.go uses for lookups. However git log renders emojis in an encoding that neither Go nor my terminal render as emojis, so GitRepo.Files is being populated with testfiles/emoji\360\237\223\232.txt instead of emoji📚.txt.

In case it helps, I've forked gitmap and added a new testcase (currently broken) in https://github.com/slnc/gitmap-emojis, which you can run with:

$ git clone https://github.com/slnc/gitmap-emojis
$ cd gitmap-emojis
$ go test -v gitmap

Or you can just call git log directly with:

$ git log -1 --name-only 0930165bc16ed84ee1ebbed59b621e71ba3189a2                                                                                                                                                                                                                              master
commit 0930165bc16ed84ee1ebbed59b621e71ba3189a2
Author: slnc <[email protected]>
Date:   Fri Apr 22 09:57:15 2022 +0200

    Added emoji test case (currently broken)

gitmap.go
gitmap_test.go
go.mod
"testfiles/emoji\360\237\223\232.txt"

I couldn't find open bug reports about git log and emojis so I assume that git log is working as intended. If that's correct, it seems like the best fix would be to escape the filename about to be looked up in Files here: hugo/hugolib/gitinfo.go, however I can't figure out what encoding git log is using. According to https://unicode-table.com/en/1F4DA/, the "books" emoji has unicode number U+1F4DA, but I can't get to that unicode hex string from what git log is outputting (\360\237\223\232). Perhaps @bep is more familiar with this?

slnc avatar Apr 22 '22 08:04 slnc

On your system, do this:

hugo new hugö.md  # note the umlaut over the o
git status

You probably see this:

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        "content/hug\303\266.md"

But on my system I see this:

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        content/hugö.md

Because I do this whenever I install Git:

git config --global core.quotepath false

So that will fix the problem when you build the site on your system, but I haven't tested what happens when you build the site with Netlify, GitHub Actions, etc.

Ref: https://git-scm.com/docs/git-config#Documentation/git-config.txt-corequotePath

jmooring avatar Apr 22 '22 12:04 jmooring

According to StackOverflow, Git uses quoted octal strings by default, so a strconv.Unquote(a) should fix it; not sure if there are any negative side effects to doing that.

bep avatar Apr 22 '22 13:04 bep

There's also the -z option for various commands. For example:

This might be simpler.

jmooring avatar Apr 22 '22 13:04 jmooring

There's also the -z option for various commands. For example:

  • https://git-scm.com/docs/git-log#Documentation/git-log.txt--z

This might be simpler.

I came to say exactly that. I've tried adding that to gitmap.go and the test I added for the emoji-in-filename seems to work but TestEncodeJSON fails.

Should I send a PR with the change? Joe: it might be faster if you could send the PR because you're probably more familiar, but I'm happy to try to get a PR working (I might need some help because I couldn't get "go test" to run without creating a go.mod file).

I've updated https://github.com/slnc/gitmap-emojis to include -z (but TestEncodeJSON is still failing, I don't see an obvious reason as to why).

slnc avatar Apr 22 '22 13:04 slnc

From the git docs...

Separate the commits with NULs instead of with new newlines.

You need to look at how the output is trimmed and split.

jmooring avatar Apr 22 '22 13:04 jmooring

@slnc

Sorry for the noise. This will work, and doesn't change the separators:

git -c "core.quotepath=0" status

jmooring avatar Apr 22 '22 13:04 jmooring

@slnc

Sorry for the noise. This will work, and doesn't change the separators:

git -c "core.quotepath=0" status

No problem Joe, thanks, I applied that locally and that seems to work! I'll prepare a pull request unless you tell me otherwise.

slnc avatar Apr 22 '22 13:04 slnc

Pull request sent: https://github.com/bep/gitmap/pull/10. If I'm following the code correctly, after the pull request is merged, @bep needs to make a new gitmap release and then we can update Hugo, right?

slnc avatar Apr 22 '22 14:04 slnc

@slnc That is correct.

jmooring avatar Apr 22 '22 14:04 jmooring

I'm running go mod tidy on https://github.com/gohugoio/hugo/compare/master...slnc:hugo9810?expand=1 to update go.sum but it's giving me an error:

verifying go.mod: github.com/bep/[email protected]/go.mod: reading https://sum.golang.org/lookup/github.com/bep/[email protected]: 410 Gone server response: not found: create zip: testfiles/emoji📚.txt: malformed file path "testfiles/emoji📚.txt": invalid char '📚'

I've filed a bug against Go but if there is no workaround it might be necessary to undo my pull request to not block Hugo releases.

I will keep this issue updated.

slnc avatar Apr 22 '22 17:04 slnc

https://go.dev/ref/mod#zip-path-size-constraints

Looks like you should place an empty go.mod file in the test files directory

Also, I think the path in the root go.mod should be github.com/bep/gitmap not gitmap

https://github.com/bep/gitmap/blob/master/go.mod#L1

jmooring avatar Apr 22 '22 17:04 jmooring

Fixed.

bep avatar Apr 22 '22 17:04 bep

... but, since Hugo Modules is backed by Go Modules, I'm not sure having emojis in filenames is such a great idea :-)

bep avatar Apr 22 '22 17:04 bep

... but, since Hugo Modules is backed by Go Modules, I'm not sure having emojis in filenames is such a great idea :-)

This request isn't for Hugo modules, it's to support a "last modified notes" partial. I use http://obsidian.md/ to manage my digital garden of notes which I publish with Hugo. Using emojis in note filenames is a popular choice because note names (filenames) tend to be quite large and emojis reduce the character count, see (each entry is a file):

Screen Shot 2022-04-22 at 19 34 04

Now, in order to show a "last modified notes" partial I need Hugo's GitInfo because it's the only automated way that I know of that will work in a Hugo + Netlify setup.

If this fix hadn't been possible I had thought about writing a git pre-commit hook to automatically set or modify the lastmod frontmatter attribute on every .md file part a commit, but that would have only fixed the issue for me.

slnc avatar Apr 22 '22 19:04 slnc

This request isn't for Hugo modules, i

The request is for Hugo, in which is built on top of Hugo Modules, which this currently will fail with. I understand that this works for you, but I need to consider complains coming in from others in a week. I will have a look at this later.

bep avatar Apr 22 '22 19:04 bep

This request isn't for Hugo modules, i

The request is for Hugo, in which is built on top of Hugo Modules, which this currently will fail with. I understand that this works for you, but I need to consider complains coming in from others in a week. I will have a look at this later.

I didn't know that this patch affected Hugo Modules. I agree that if this breaks other parts of Hugo and the Go team doesn't fix the underlying issue (which actually was reported a year ago), the fix might need to be rolled back or rewritten. What I suggest is to make gitmap fail loudly when encountering such files like Go does instead of failing silently because imo that's the least expected behavior (GitInfo works for some files but not for others).

I don't know how this breaks Hugo modules, but perhaps adding a config option to enable this behavior might get us the best of both worlds (I know, not a good idea in general).

slnc avatar Apr 23 '22 09:04 slnc

@slnc Until this is resolved, configure your local and CI/CD[^1] Git instances using

git config --global core.quotepath false

I tested this today with:

[^1]: Netlify, GitHub Pages, GitLab pages, etc.

jmooring avatar Apr 23 '22 15:04 jmooring

Done, thanks Joe!

slnc avatar Apr 23 '22 15:04 slnc