git icon indicating copy to clipboard operation
git copied to clipboard

log: describe supported encoding

Open yecril71pl opened this issue 4 years ago • 17 comments

Thanks for taking the time to contribute to Git! Please be advised that the Git community does not use github.com for their contributions. Instead, we use a mailing list ([email protected]) for code submissions, code reviews, and bug reports. Nevertheless, you can use GitGitGadget (https://gitgitgadget.github.io/) to conveniently send your Pull Requests commits to our mailing list.

Please read the "guidelines for contributing" linked above!

cc: Bagas Sanjaya [email protected]

cc: Jeff King [email protected]

yecril71pl avatar Aug 26 '21 18:08 yecril71pl

Welcome to GitGitGadget

Hi @yecril71pl, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that your Pull Request has a good description, as it will be used as cover letter.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "revisions:" to state which subsystem the change is about, and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the Libera Chat IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join [email protected], where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Libera Chat. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

gitgitgadget-git[bot] avatar Aug 26 '21 18:08 gitgitgadget-git[bot]

There is an issue in commit 1d09bff58d5e2849446f03f21b86b4c810b82200: Commit checks stopped - the message is too short

gitgitgadget-git[bot] avatar Aug 26 '21 18:08 gitgitgadget-git[bot]

/allow

dscho avatar Aug 26 '21 20:08 dscho

User yecril71pl is now allowed to use GitGitGadget.

gitgitgadget-git[bot] avatar Aug 26 '21 20:08 gitgitgadget-git[bot]

There is an issue in commit 3fa2f76781d300ea4497cc74c855e2cc541f72fa: Prefixed commit message must be in lower case: pretty-options.txt: Describe supported encoding

gitgitgadget-git[bot] avatar Aug 26 '21 20:08 gitgitgadget-git[bot]

/submit

yecril71pl avatar Aug 26 '21 21:08 yecril71pl

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-git-1079/yecril71pl/patch-1-v1

To fetch this version to local tag pr-git-1079/yecril71pl/patch-1-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-git-1079/yecril71pl/patch-1-v1

gitgitgadget-git[bot] avatar Aug 26 '21 21:08 gitgitgadget-git[bot]

On the Git mailing list, Bagas Sanjaya wrote (reply to this):

On 27/08/21 04.34, Christopher Yeleighton via GitGitGadget wrote:
> From: Christopher Yeleighton <[email protected]>
> 
> Please fix the manual for git log.  It should say what encoding is recognised
> (namely if supported by iconv(1), except that POSIX character maps of
> iconv(1p) are not supported), and that an unrecognised encoding is ignored.
> 
> Signed-off-by:  <[email protected]>
> ---

The commit message should be:
"git log recognizes only system encodings supported by iconv(1), but not 
POSIX character maps used by iconv(1p). Document it.".

>   	The commit objects record the encoding used for the log message
>   	in their encoding header; this option can be used to tell the
>   	command to re-code the commit log message in the encoding
> -	preferred by the user.  For non plumbing commands this
> -	defaults to UTF-8. Note that if an object claims to be encoded
> -	in `X` and we are outputting in `X`, we will output the object
> +	preferred by the user.
> +	The encoding must be a system encoding supported by iconv(1),
> +	otherwise this option will be ignored.
> +	POSIX character maps used by iconv(1p) are not supported.
> +	For non-plumbing commands this defaults to UTF-8.
> +	Note that if an object claims to be encoded in `X`
> +	and we are outputting in `X`, we shall output the object
>   	verbatim; this means that invalid sequences in the original
>   	commit may be copied to the output.
>   

I think POSIX character maps and encoding are the same, what are their 
differences? Reading iconv(1p) [1] doesn't give definition of the former.

[1]: https://man7.org/linux/man-pages/man1/iconv.1p.html

-- 
An old man doll... just what I always wanted! - Clara

gitgitgadget-git[bot] avatar Aug 27 '21 10:08 gitgitgadget-git[bot]

User Bagas Sanjaya <[email protected]> has been added to the cc: list.

gitgitgadget-git[bot] avatar Aug 27 '21 10:08 gitgitgadget-git[bot]

Dnia piątek, 27 sierpnia 2021 12:50:43 CEST gitgitgadget-git[bot] pisze:

I think POSIX character maps and encoding are the same, what are their differences? Reading iconv(1p) 1 doesn't give definition of the former.

System encoding providers are code, POSIX character maps are data.

Chris

yecril71pl avatar Aug 27 '21 11:08 yecril71pl

Dnia piątek, 27 sierpnia 2021 12:50:43 CEST gitgitgadget-git[bot] pisze: I think POSIX character maps and encoding are the same, what are their differences? Reading iconv(1p) [1] doesn't give definition of the former. [1]: https://man7.org/linux/man-pages/man1/iconv.1p.html System encoding providers are code, POSIX character maps are data. Chris

@yecril71pl please note that this PR comment will not be forwarded to the mailing list automatically. You might want to follow the "reply to this" instructions to reply to the mail on the Git mailing list.

dscho avatar Aug 27 '21 11:08 dscho

On the Git mailing list, Krzysztof Żelechowski wrote (reply to this):

Dnia piątek, 27 sierpnia 2021 12:46:22 CEST Bagas Sanjaya pisze:
> I think POSIX character maps and encoding are the same, what are their
> differences? Reading iconv(1p) [1] doesn't give definition of the former.
> 
> [1]: https://man7.org/linux/man-pages/man1/iconv.1p.html

System encoding providers are code, POSIX character maps are data.

Chris



gitgitgadget-git[bot] avatar Aug 27 '21 11:08 gitgitgadget-git[bot]

On the Git mailing list, Krzysztof Żelechowski wrote (reply to this):

git log recognises only system encodings supported by iconv(1), but not 
POSIX character maps used by iconv(1p). Document it.

Signed-off-by:  <[email protected]>

diff --git a/Documentation/pretty-options.txt b/Documentation/pretty-
options.txt
index 27ddaf84a19..4f8376d681b 100644
--- a/Documentation/pretty-options.txt
+++ b/Documentation/pretty-options.txt
@@ -36,9 +36,13 @@ people using 80-column terminals.
        The commit objects record the encoding used for the log message
        in their encoding header; this option can be used to tell the
        command to re-code the commit log message in the encoding
-       preferred by the user.  For non plumbing commands this
-       defaults to UTF-8. Note that if an object claims to be encoded
-       in `X` and we are outputting in `X`, we will output the object
+       preferred by the user.
+       The encoding must be a system encoding supported by iconv(1),
+       otherwise this option will be ignored.
+       POSIX character maps used by iconv(1p) are not supported.
+       For non-plumbing commands this defaults to UTF-8.
+       Note that if an object claims to be encoded in `X`
+       and we are outputting in `X`, we shall output the object
        verbatim; this means that invalid sequences in the original
        commit may be copied to the output.
 


gitgitgadget-git[bot] avatar Aug 27 '21 11:08 gitgitgadget-git[bot]

On the Git mailing list, Junio C Hamano wrote (reply to this):

Krzysztof Żelechowski <[email protected]> writes:

> git log recognises only system encodings supported by iconv(1), but not 
> POSIX character maps used by iconv(1p). Document it.
>
> Signed-off-by:  <[email protected]>

The "Human Readable Name <[email protected]>" on this line must match
the one on the "From: " line that records the author of the patch.

If you are forwarding somebody else's patch (with or without
improvement), we also need your sign off.

> diff --git a/Documentation/pretty-options.txt b/Documentation/pretty-
> options.txt
> index 27ddaf84a19..4f8376d681b 100644
> --- a/Documentation/pretty-options.txt
> +++ b/Documentation/pretty-options.txt
> @@ -36,9 +36,13 @@ people using 80-column terminals.
>         The commit objects record the encoding used for the log message
>         in their encoding header; this option can be used to tell the
>         command to re-code the commit log message in the encoding
> -       preferred by the user.  For non plumbing commands this
> -       defaults to UTF-8. Note that if an object claims to be encoded
> -       in `X` and we are outputting in `X`, we will output the object
> +       preferred by the user.


> +       The encoding must be a system encoding supported by iconv(1),
> +       otherwise this option will be ignored.
> +       POSIX character maps used by iconv(1p) are not supported.

This paragraph is a bit hard to grok.

I think it is saying that the "-f frommap -t tomap" form in [*1*]
that can use arbitrary character set description file is not
supported, but "-f fromcode -t tocode" form, which also is what
iconv_open() takes [*2*], is supported.  Am I reading it correctly?

Is there an easier-to-read way to explain the distinction to our
average reader?

What I am getting at is this.  Imagine average users who need to see
their commits recoded to iso-8859-2.  They see "git log" has
"--encoding=<encoding>" option, read the above paragraph and wonder
if they are on the supported side or unsupported side of the above
paragraph.  I want to make it easy for them to stop wondering.

For that purpose, "iconv(1) vs iconv(1p)" would not help them very
much, especially considering that not all Git users are UNIX users
(they probably do not even know what (1) and (1p) means).

> +       For non-plumbing commands this defaults to UTF-8.

I think I can guess why the patch wants to change "non plumbing" to
"non-plumbing" (I do not strongly care either way, so I'd take the
patch without complaint about that particular change).  It would
have been nicer to mention this change in the proposed commit log
message, though, but that is minor.

> +       Note that if an object claims to be encoded in `X`
> +       and we are outputting in `X`, we shall output the object
>         verbatim; this means that invalid sequences in the original
>         commit may be copied to the output.

I probably wouldn't have noticed this if a new manual page used
"shall" consistently, but since the original deliberately used
"will" and the patch changes it to "shall", I have to ask: why?

I think our end-user facing manual pages tend to avoid the latter.
We do use "shall" in the RFC2119/BCP14 sense on the technical side
of our documentation where we give requirements to the third-party
implementations so that they can interoperate with us, but this is
not such a description.

Thanks.


[References]

*1* https://pubs.opengroup.org/onlinepubs/9699919799/utilities/iconv.html 
*2* https://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv_open.html

gitgitgadget-git[bot] avatar Aug 27 '21 17:08 gitgitgadget-git[bot]

On the Git mailing list, Jeff King wrote (reply to this):

On Fri, Aug 27, 2021 at 10:03:56AM -0700, Junio C Hamano wrote:

> > +       The encoding must be a system encoding supported by iconv(1),
> > +       otherwise this option will be ignored.
> > +       POSIX character maps used by iconv(1p) are not supported.
> 
> This paragraph is a bit hard to grok.
> 
> I think it is saying that the "-f frommap -t tomap" form in [*1*]
> that can use arbitrary character set description file is not
> supported, but "-f fromcode -t tocode" form, which also is what
> iconv_open() takes [*2*], is supported.  Am I reading it correctly?
> 
> Is there an easier-to-read way to explain the distinction to our
> average reader?
> 
> What I am getting at is this.  Imagine average users who need to see
> their commits recoded to iso-8859-2.  They see "git log" has
> "--encoding=<encoding>" option, read the above paragraph and wonder
> if they are on the supported side or unsupported side of the above
> paragraph.  I want to make it easy for them to stop wondering.
> 
> For that purpose, "iconv(1) vs iconv(1p)" would not help them very
> much, especially considering that not all Git users are UNIX users
> (they probably do not even know what (1) and (1p) means).

I likewise found the mention of character maps confusing. If we were to
refer to anything, it would be iconv(3) or iconv_open(3). But really,
all of the discussion that led to this patch seemed to be about the
distinction between "character set conversion" (or "character encoding",
or "codeset conversion", all terms used by the POSIX pages) and the
syntactic encoding of HTML.

Is there any version of iconv that would convert "<" to "&lt;"?

I guess that _conceptually_ one could think of that as a multi-byte
character conversion, but it seems to me that it is generally considered
a layer above (after all, the original "<" and characters in the HTML
entity have to be in some character encoding; generally ASCII, but I
think you could have UTF-16 HTML, too).

What I'm getting it is that maybe we just need to use a less generic
word than "encoding". Perhaps just s/encoding/character &/ or something?
And maybe add something like:

  Conversions are done using the system iconv(3) function. The set of
  available encodings will depend on your system.

You _can_ use "iconv -l" to get such a list on many systems, but it is
not even necessarily the same list.

I also wonder if other mentions of encoding would want to use the same
term (e.g., gitattributes working-tree-encoding), and of course
i18n.commitEncoding (though peeking at the latter, it seems to already
say "Character encoding", so maybe that is sufficient).

-Peff

gitgitgadget-git[bot] avatar Aug 27 '21 18:08 gitgitgadget-git[bot]

User Jeff King <[email protected]> has been added to the cc: list.

gitgitgadget-git[bot] avatar Aug 27 '21 18:08 gitgitgadget-git[bot]

On the Git mailing list, Krzysztof Żelechowski wrote (reply to this):

Dnia piątek, 27 sierpnia 2021 19:03:56 CEST Junio C Hamano pisze:
> > +       The encoding must be a system encoding supported by iconv(1),
> > +       otherwise this option will be ignored.
> > +       POSIX character maps used by iconv(1p) are not supported.
> 
> This paragraph is a bit hard to grok.
> 
> I think it is saying that the "-f frommap -t tomap" form in [*1*]
> that can use arbitrary character set description file is not
> supported, but "-f fromcode -t tocode" form, which also is what
> iconv_open() takes [*2*], is supported.  Am I reading it correctly?

Yes

> 
> Is there an easier-to-read way to explain the distinction to our
> average reader?

It is not our job to explain what POSIX character maps are.  The takeaway is 
they are unsupported; if you do not know what they are, why should you bother?

> 
> What I am getting at is this.  Imagine average users who need to see
> their commits recoded to iso-8859-2.  They see "git log" has
> "--encoding=<encoding>" option, read the above paragraph and wonder
> if they are on the supported side or unsupported side of the above
> paragraph.  I want to make it easy for them to stop wondering.
> 
> For that purpose, "iconv(1) vs iconv(1p)" would not help them very
> much, especially considering that not all Git users are UNIX users
> (they probably do not even know what (1) and (1p) means).

I am sorry, as a UNIX user I have no idea what iconv, being part of the GNU C 
library, means and how it works on a non-UNIX system that does not contain 
one.  If you know, could you enlighten us please?

> I think our end-user facing manual pages tend to avoid the latter.
> We do use "shall" in the RFC2119/BCP14 sense on the technical side
> of our documentation where we give requirements to the third-party
> implementations so that they can interoperate with us, but this is
> not such a description.
> 
> Thanks.

I shall revert it after we have come to an agreement about the POSIX stuff.

BR,
Chris




gitgitgadget-git[bot] avatar Aug 27 '21 23:08 gitgitgadget-git[bot]