poedit icon indicating copy to clipboard operation
poedit copied to clipboard

Conversion specifier error

Open concatenateNL opened this issue 10 years ago • 11 comments

When translating the English '% comments' from a WordPress theme language file into the Dutch '% reacties', an error is shown: Error:'msgstr'is not a valid PHP format string, unlike 'msgid'. Reason: In the directive number 1, the character 'r' is not a valid conversion specifier. I tried to translate as '% antwoorden', but the same error occurs, except mentioning the character 'a'.

When copying (ctrl-b) the original text, no error is given. I double checked to see if there was a whitespace after the % character.

concatenateNL avatar Aug 12 '14 18:08 concatenateNL

The error comes from the msgfmt tool and it complains because the string is marked in the PO file as php-format, i.e. a format string where % has special meaning; in particular, following it by space is invalid.

See here for a detailed explanation: https://groups.google.com/d/msg/poedit/f34rrvx7UT8/vjaWHOK2O1QJ

You can either ignore it (the MO should still be compiled) or fix it in the PHP source code. An example of the latter can be seen here: https://github.com/osclass/Osclass/pull/779 (with the specific code change here: https://github.com/osclass/Osclass/commit/679f19a064808d46c14c199bf4c36531e9b5993d - but the whole pull request is explanatory).

vslavik avatar Aug 13 '14 07:08 vslavik

Hi Václav,

I would like to get back on this, without opening the ticket again. I always thought this was correct code for a WP theme. It is used in this line:

I've never seen another way to get the plural form of a word.

And, the strangest thing is, when i just copy the original language to the translated language, the error does not occur. I mean, % comments in the translation doesn't give the error.

Best regards,

Michael Albers Concatenate

Václav Slavík schreef op 13-8-2014 om 9:03:

The error comes from the |msgfmt| tool and it complains because the string is marked in the PO file as |php-format|, i.e. a format string where |%| has special meaning; in particular, following it by space is invalid.

See here for a detailed explanation: https://groups.google.com/d/msg/poedit/f34rrvx7UT8/vjaWHOK2O1QJ

You can either ignore it (the MO should still be compiled) or fix it in the PHP source code. An example of the latter can be seen here: osclass/Osclass#779 https://github.com/osclass/Osclass/pull/779 (with the specific code change here: osclass/Osclass@679f19a https://github.com/osclass/Osclass/commit/679f19a064808d46c14c199bf4c36531e9b5993d

  • but the whole pull request is explanatory).

— Reply to this email directly or view it on GitHub https://github.com/vslavik/poedit/issues/95#issuecomment-52016315.

concatenateNL avatar Aug 13 '14 07:08 concatenateNL

I've never seen another way to get the plural form

This has nothing at all to do with plural forms. The only problem here is that xgettext (not Poedit as such) mis-marks the string as being a PHP format string in the PO file, during extraction from source code, because its heuristic failed.

Please do read all the links I took the time to provide, in their entirety. They do explain the issue — as well as how to fix it — in detail.

Bottom line, there are only two ways around it: a) improve GNU gettext's PHP extraction code (which is harder than it sounds) or b) tag the source code with a hint for gettext to do the right thing. As the plugin's author, you can easily do b) — again, see the links for details.

when i just copy the original language

A quirk of msgfmt tool's validation.

vslavik avatar Aug 13 '14 08:08 vslavik

Hi,

Thanks for taking your time. Unfortunately, the links you provided don't fully explain the issue to me. It is a WooThemes theme that gives this issue, not something that I programmed myself.

I've checked the .po-file and indeed it has the php-format flag on this string. Removing the flag, corrects the issue and without it, it seems that the file gives no errors while in use on the website.

I am not blaming Poedit, but I really would like to know why in other instances this error is not thrown out. I've translated themes before with the free version of Poedit, but never had an issue like this. An occasional forgotten s in %s, happens. But the issue on hand, seems strange to me.

Two similar strings in the php files do not give these issues:

and

don't output the error like

does.

In short, I feel something strange happens, but I don't know why or where it is caused. The % sign is not a percentage sign but it should be replaced with the actual number of comments. Bothe functions, comments_number and comments_popup_link are in the Codex and according to the Codex I don't see any errors in the code.

Anyway, I really appreciate your effort to help me get my head around this.

Best regards,

Michael Albers

Václav Slavík schreef op 13-8-2014 om 10:35:

I've never seen another way to get the plural form

This has /nothing at all/ to do with plural forms. The /only/ problem here is that |xgettext| (not Poedit as such) mis-marks the string as being a PHP format string http://php.net/manual/en/function.sprintf.php in the PO file, during extraction from source code, because its heuristic failed.

Please /do read/ all the links I took the time to provide, in their entirety. They do explain the issue — as well as how to fix it — in detail.

Bottom line, there are only two ways around it: a) improve GNU gettext's PHP extraction code (which is harder than it sounds) or b) tag the source code with a hint for gettext to do the right thing. As the plugin's author, you can easily do b) — again, see the links for details.

when i just copy the original language

A quirk of |msgfmt| tool's validation.

— Reply to this email directly or view it on GitHub https://github.com/vslavik/poedit/issues/95#issuecomment-52022660.

concatenateNL avatar Aug 13 '14 11:08 concatenateNL

Unfortunately, the links you provided don't fully explain the issue to me.

I really don't know what else to add :-( The strings are marked as php-format ones, so gettext treats them as format strings and %<space> is not a valid part of format string, hence the error.

I really would like to know why in other instances this error is not thrown out

There are only two explanations [that I can think of]: either these other instances are not php-format strings (and so aren't validated) or they use valid format specifiers such as %s (and so there are no errors to report). Maybe some of the cases were created with a different version of xgettext? Or the heuristic evaluated it differently? Or maybe somebody manually removed the php-format designation?

Bothe functions, comments_number and comments_popup_link are in the Codex and according to the Codex I don't see any errors in the code.

The code is fine as PHP code. Again, the issue is that gettext tools use a heuristic (i.e. something that is, by definition, not 100% correct) to detect PHP format strings. The official solution, per the gettext manual, is to use the /* xgettext:no-php-format */ comment to nudge xgettext in the right direction.

That's the best, clean fix. You could also remove the php-format manually to get rid of it and I suppose it would be useful to have that option in Poedit (although you'd have to do it again after every update from PHP sources), but I'm not sure what else to do about it...

vslavik avatar Aug 13 '14 12:08 vslavik

Hmm, xgettext behavior is weird: % zomments is handled correctly, but % comments isn't. I think it may be because %c is a valid format specifier, so perhaps xgettext ignores the space between % and c in % c… I'm going to investigate it.

vslavik avatar Aug 13 '14 12:08 vslavik

This makes me happy as it seems I am not as insane as I thought.

Václav Slavík schreef op 13-8-2014 om 14:03:

Reopened #95 https://github.com/vslavik/poedit/issues/95.

— Reply to this email directly or view it on GitHub https://github.com/vslavik/poedit/issues/95#event-152328097.

concatenateNL avatar Aug 13 '14 13:08 concatenateNL

I'm going to investigate it.

So here's the full story:

In PHP format strings, space immediately following % actually is permitted, as a padding character that is followed by the rest of the specification. So e.g. % is not valid, but % s (a string) or % c (a character) is and their are, in this case, the same as %s or %c respectively. That's why xgettext detects % c in % comments and (incorrectly) thinks that it is a format string.

Then when compiling the PO file with msgfmt, it doesn't complain about % comments, because it sees that '% c' part there, which is valid in a format string. But if you translate it differently (e.g. in Czech, % komentářů), it complains because k in % k is not valid.

So this misbehavior is an unfortunate accident of a) the source string contain "c" after "%" and b) the translation not having "c" as the first character of the word.

In short, I really don't see any other solution than adding /* xgettext:no-php-format */ and I think the WooCommerce guys should do it, because this problem will exist with all tools, not just Poedit.

As a translator user, do you think it's an acceptable workaround to be able to delete the php-format designation from within Poedit (knowing that the next refresh of the translation from POT or source code will re-add it back)?

vslavik avatar Aug 13 '14 14:08 vslavik

Hi Václav,

Thank you for effort. For my translating work it is not a problem to remove the the line manually from the .po-file. I have no clue on how to remove it from within poedit. I never looked into the details of poedit. I will add the xgettext line and submit a ticket at WooThemes.

Best regards,

Michael Albers

Václav Slavík schreef op 13-8-2014 om 16:21:

I'm going to investigate it.

So here's the full story:

In PHP format strings, space immediately following |%| actually /is/ permitted, as a padding character that is followed by the rest of the specification. So e.g. |%| is not valid, but |% s| (a string) or |% c| (a character) is and their are, in this case, the same as |%s| or |%c| respectively. That's why |xgettext| detects |% c| in |% comments| and (incorrectly) thinks that it is a format string.

Then when compiling the PO file with |msgfmt|, it doesn't complain about |% comments|, because it sees that '% c' part there, which is valid in a format string. But if you translate it differently (e.g. in Czech, |% komentářů|), it complains because |k| in |% k| is /not/ valid.

So this misbehavior is an unfortunate accident of a) the source string contain "c" after "%" and b) the translation not having "c" as the first character of the word.

In short, I really don't see any other solution than adding |/* xgettext:no-php-format */| and I think the WooCommerce guys should do it, because this problem will exist with all tools, not just Poedit.

As a translator user, do you think it's an acceptable workaround to be able to delete the |php-format| designation from within Poedit (knowing that the next refresh of the translation from POT or source code will re-add it back)?

— Reply to this email directly or view it on GitHub https://github.com/vslavik/poedit/issues/95#issuecomment-52054557.

concatenateNL avatar Aug 13 '14 15:08 concatenateNL

I have no clue on how to remove it from within poedit.

Oh, that's because it's not currently possible, sorry for being unclear. I'm just curious if it's worth adding or not...

vslavik avatar Aug 13 '14 15:08 vslavik

It could be interesting, since adding a fuzzy flag is also possible. And, it quickly resolves issues like the one I had. When updating from source, it would be very nice if all php-format strings would be shown just as all fuzzy translations. That way, it is easy to check if those strings really are php-format or not.

Václav Slavík schreef op 13-8-2014 om 17:44:

I have no clue on how to remove it from within poedit.

Oh, that's because it's not currently possible, sorry for being unclear. I'm just curious if it's worth adding or not...

— Reply to this email directly or view it on GitHub https://github.com/vslavik/poedit/issues/95#issuecomment-52066945.

concatenateNL avatar Aug 13 '14 16:08 concatenateNL