babel icon indicating copy to clipboard operation
babel copied to clipboard

pybabel extract command from CLI only respects the first argument passed into keywords

Open ankitd33 opened this issue 2 years ago • 4 comments

Overview Description

When running pybabel extract

pybabel extract -F CONFIG_FILEPATH -o POT_FILEPATH REPO_T_CHECK --keywords=translate:1 --keywords=translate:1,2 -c TRANSLATORS --no-wrap --no-default-keywords

it only extracts strings in the first input in translate and not both the first input and ones where it has two inputs to treat them as plurals

Both the other commands (below) work perfectly and ideally when I run the above command I want a superset of the two with the second keywords overwriting the first if the same msgid shows up in that one

pybabel extract -F CONFIG_FILEPATH -o POT_FILEPATH REPO_T_CHECK --keywords=translate:1,2 -c TRANSLATORS --no-wrap --no-default-keywords

pybabel extract -F CONFIG_FILEPATH -o POT_FILEPATH REPO_T_CHECK --keywords=translate:1 -c TRANSLATORS --no-wrap --no-default-keywords

Steps to Reproduce

Run pybabel extract with two keywords, one to extract normal strings and one to extract strings and plurals

Actual Results

Essentially

pybabel extract -F CONFIG_FILEPATH -o POT_FILEPATH REPO_T_CHECK --keywords=translate:1 --keywords=translate:1,2 -c TRANSLATORS --no-wrap --no-default-keywords

does the same as running

pybabel extract -F CONFIG_FILEPATH -o POT_FILEPATH REPO_T_CHECK --keywords=translate:1 -c TRANSLATORS --no-wrap --no-default-keywords

Expected Results

Reproducibility

always

Additional Information

ankitd33 avatar Mar 14 '24 21:03 ankitd33

I've been looking into this issue. It appears that it only occurs when multiple keywords have the same function name and the functions aren't differentiated by using a 't' argument. For instance, say that your input data is:

msg1 = translate("bunny", "bunnies", len(bunnies))
msg2 = translate('follow')

You will get the desired results if you run pybabel extract with --keywords=translate:1,1t --keywords=translate:1,2,3t instead of --keywords=translate:1 --keywords=translate:1,2

The keywords data structure isn't currently set up to allow multiple keywords with the same function name unless they are differentiated with a 't' argument. It could probably be extended to allow for this. Or would it be better to detect duplicate keywords like this and give an error/warning prompting the user to add 't' arguments?

EmilyBStudent avatar Nov 17 '24 08:11 EmilyBStudent

FWIW xgettext allows it without raising any warnings:

xgettext -o - --keyword=translate:1 --keyword=translate:1,2 test.py

Based on that, I think we should support it as well

tomasr8 avatar Nov 17 '24 12:11 tomasr8

I've been working on this issue and have it working, while maintaining backwards compatibility with the previous keywords dictionary format. To allow for keywords with multiple specs that aren't distinguished with a 't' argument, the keyword dictionary needs to be extended to allow for a collection of specs as well as just a single spec per number of arguments, e.g.

keywords = {
    '_': ((1,), (1, 2))
}

For backwards compatibility, I have the code only generate a collection containing multiple specs if there are multiple specs it needs to store. Otherwise it generates a keyword dict in the same format as previously, with the spec stored directly as the dictionary value (and all the existing unit tests pass without changes on my machine so this appears to be working).

Currently I'm using tuples to contain the collection of relevant specs, but since specs are also represented as tuples, that's causing significant inelegancies in distinguishing a spec tuple from a tuple containing or potentially containing multiple specs. Would it be preferable to use a list instead?

Sorry to ask this after opening the pull request!

EmilyBStudent avatar Dec 03 '24 12:12 EmilyBStudent

Sorry for the late answer @EmilyBStudent !

Currently I'm using tuples to contain the collection of relevant specs, but since specs are also represented as tuples, that's causing significant inelegancies in distinguishing a spec tuple from a tuple containing or potentially containing multiple specs. Would it be preferable to use a list instead?

Since the specs are unique, what about using a set?

tomasr8 avatar Jan 03 '25 14:01 tomasr8