Extraction in js template strings fails
Overview Description
The text extraction fails, after a html attribute localization in quoted signs.
Steps to Reproduce
Run extraction on the following javascript template string code:
`<label>${_('AA0')}</label>
<select data-placeholder="${_('AA1')}">${_('AA2')}</select>
<select data-placeholder="` + _('AA3') + `">${_('AA4')}</select>
<select data-placeholder="">${_('AA5')}</select>
<select data-placeholder="">` + _('AA6') + `</select>`
Actual Results
The extraction finds the texts for AA0, AA2, AA3 and AA6. Based on https://github.com/python-babel/babel/issues/329 it is to be expected that AA1 is not evaluated. But while AA2 was previously recognized, the same format does not work any more for AA4 and AA5. And only AA6 did work again.
Expected Results
The texts AA0 through AA6, except for AA1, should be found.
Reproducibility
Always
Additional Information
babel.cfg
[javascript: client/static/**/*.js]
encoding = utf-8
silent = false
extensions = webassets.ext.jinja2.AssetsExtension
parse_template_string = true
It seems, that this problem occurs solely for js template extractions belonging to an html attribute enclosed by quote signs. The parse_template_string function explicitely skips all content inside quotes. Shouldn't the ${} clause always be handled, since an interruption of the template string is already not handled by the parse_template_string function?
I have gotten no problems for now, overwriting the parse method with a slightly customized one in this form, ignoring the quote handling
def parse_template_string_overwrite(template_string: str,
keywords,
comment_tags,
options,
lineno: int = 1, ):
import io
from babel.messages.jslexer import line_re
from babel.messages.extract import extract_javascript
prev_character = None
level = 0
expression_contents = ''
for character in template_string[1:-1]:
if level:
expression_contents += character
if character == '{' and prev_character == '$':
level += 1
elif level and character == '}':
level -= 1
if level == 0 and expression_contents:
expression_contents = expression_contents[0:-1]
fake_file_obj = io.BytesIO(expression_contents.encode())
yield from extract_javascript(fake_file_obj, keywords, comment_tags, options, lineno)
lineno += len(line_re.findall(expression_contents))
expression_contents = ''
prev_character = character