Extraction in js template strings fails

Open HawkOnPK opened this issue 3 years ago • 1 comments

Overview Description

The text extraction fails, after a html attribute localization in quoted signs.

Steps to Reproduce

Run extraction on the following javascript template string code:

`<label>${_('AA0')}</label>
<select data-placeholder="${_('AA1')}">${_('AA2')}</select>
<select data-placeholder="` + _('AA3') + `">${_('AA4')}</select>
<select data-placeholder="">${_('AA5')}</select>
<select data-placeholder="">` + _('AA6') + `</select>`

Actual Results

The extraction finds the texts for AA0, AA2, AA3 and AA6. Based on https://github.com/python-babel/babel/issues/329 it is to be expected that AA1 is not evaluated. But while AA2 was previously recognized, the same format does not work any more for AA4 and AA5. And only AA6 did work again.

Expected Results

The texts AA0 through AA6, except for AA1, should be found.

Reproducibility

Always

Additional Information

babel.cfg

[javascript: client/static/**/*.js]
encoding = utf-8
silent = false
extensions = webassets.ext.jinja2.AssetsExtension
parse_template_string = true

Mar 13 '23 12:03 HawkOnPK

It seems, that this problem occurs solely for js template extractions belonging to an html attribute enclosed by quote signs. The parse_template_string function explicitely skips all content inside quotes. Shouldn't the ${} clause always be handled, since an interruption of the template string is already not handled by the parse_template_string function?

I have gotten no problems for now, overwriting the parse method with a slightly customized one in this form, ignoring the quote handling

def parse_template_string_overwrite(template_string: str,
                                    keywords,
                                    comment_tags,
                                    options,
                                    lineno: int = 1, ):
    import io
    from babel.messages.jslexer import line_re
    from babel.messages.extract import extract_javascript

    prev_character = None
    level = 0
    expression_contents = ''
    for character in template_string[1:-1]:
        if level:
            expression_contents += character
        if character == '{' and prev_character == '$':
            level += 1
        elif level and character == '}':
            level -= 1
            if level == 0 and expression_contents:
                expression_contents = expression_contents[0:-1]
                fake_file_obj = io.BytesIO(expression_contents.encode())
                yield from extract_javascript(fake_file_obj, keywords, comment_tags, options, lineno)
                lineno += len(line_re.findall(expression_contents))
                expression_contents = ''
        prev_character = character

Oct 03 '24 09:10 HawkOnPK