bandit Improve handling nosec for multi-line strings

This commit improves handling nosecs in multi-line strings, like:

1. nosec_not_working = f"""
2.     SELECT * FROM {table}
3. """  # nosec

Before this change, bandit was checking if there is a nosec in line 1. Now, in the case of ast Constants it searches for nosec in the last line of the expression.

This commit also moves detecting nosec without test number to test phase from "pre-visit" phase. It may increase the time of performing checks but avoids counting the same nosec mark multiple times. "pre-visit" phase is run separately for each part of multi-line string split by FormattedValue items. Thus for the above example, it would be run twice, the first time for "\n SELECT * FROM " and the second time for "\n" making the nosec being counted twice.

Resolves: #880

Jun 28 '22 07:06 kfrydel

So I think using the last line of linerange will fail for Python 3.7. Currently Bandit incorrectly reports the last line number for Py37. See issue #820

Thank you for taking a look at my changes. I'm not sure if I understand. Do you mean that my changes are not going to work in Python 3.7? get_nosec function returns context['lineno'] for strings in Python 3.7. Why? Because in the case of Python 3.7 strings are parsed to ast.Str. In the case of Python 3.10 they are parsed to ast.Constant. It looks like ast.Constant is used from 3.8 according to https://docs.python.org/3/library/ast.html:

Changed in version 3.8: Class ast.Constant is now used for all constants.

I did the following test: file: string_examples.py

a = "string 1"
b = f"f-string {a}"
c = """multi-line
string
1"""
d = f"""multi-line
f-string
{a}"""

file: ast_example.py

import ast

with open('string_examples.py', 'r') as f:
    ast_tree = ast.parse(f.read())

    for node in ast_tree.body:
        print(ast.dump(node))

And the output is:

python 3.7:

Assign(targets=[Name(id='a', ctx=Store())], value=Str(s='string 1'))
Assign(targets=[Name(id='b', ctx=Store())], value=JoinedStr(values=[Str(s='f-string '), FormattedValue(value=Name(id='a', ctx=Load()), conversion=-1, format_spec=None)]))
Assign(targets=[Name(id='c', ctx=Store())], value=Str(s='multi-line\nstring\n1'))
Assign(targets=[Name(id='d', ctx=Store())], value=JoinedStr(values=[Str(s='multi-line\nf-string\n'), FormattedValue(value=Name(id='a', ctx=Load()), conversion=-1, format_spec=None)]))

python 3.8:

Assign(targets=[Name(id='a', ctx=Store())], value=Constant(value='string 1', kind=None), type_comment=None)
Assign(targets=[Name(id='b', ctx=Store())], value=JoinedStr(values=[Constant(value='f-string ', kind=None), FormattedValue(value=Name(id='a', ctx=Load()), conversion=-1, format_spec=None)]), type_comment=None)
Assign(targets=[Name(id='c', ctx=Store())], value=Constant(value='multi-line\nstring\n1', kind=None), type_comment=None)
Assign(targets=[Name(id='d', ctx=Store())], value=JoinedStr(values=[Constant(value='multi-line\nf-string\n', kind=None), FormattedValue(value=Name(id='a', ctx=Load()), conversion=-1, format_spec=None)]), type_comment=None)

python 3.10:

Assign(targets=[Name(id='a', ctx=Store())], value=Constant(value='string 1'))
Assign(targets=[Name(id='b', ctx=Store())], value=JoinedStr(values=[Constant(value='f-string '), FormattedValue(value=Name(id='a', ctx=Load()), conversion=-1)]))
Assign(targets=[Name(id='c', ctx=Store())], value=Constant(value='multi-line\nstring\n1'))
Assign(targets=[Name(id='d', ctx=Store())], value=JoinedStr(values=[Constant(value='multi-line\nf-string\n'), FormattedValue(value=Name(id='a', ctx=Load()), conversion=-1)]))

Jul 08 '22 13:07 kfrydel

Thank you for working on this. This also fixes #658 reported in December 2020.

Aug 10 '22 07:08 frenzymadness

Hi @ericwb , would you find a while to take a look at this pull request? If you think it should be fixed in another way, do not hesitate to say so. I'm willing to spend some more time on this if needed.

Aug 23 '22 07:08 kfrydel

Hello guys, can we expect this PR to be merged some time soon?

Jan 31 '23 16:01 sshishov

I added tests for implicit concatenation cases and it turned out that my solution didn't work. I changed the approach and now I search for nosec in every line reported in linerange property. I also abandoned using lineno property and always use linerange so the code became a bit simpler. nosec can be now placed at any line of "implicit concatenation" expression. Except python 3.7. Indeed, as @ericwb said, we do not have the last line of the analyzed expression thus in the case of python 3.7 nosec has to be at the first line of the "implicit concatenation" expression. And to be honest, I have no idea on how to fix this.

In the case of a multiline string, as shown below, nosec is detected when it is placed at the last line:

f"""
SELECT *
FROM foo
WHERE id = {identifier}
"""  # nosec

independently from the python version.

Feb 27 '23 12:02 kfrydel

@sigmavirus24 Thank you!

Feb 27 '23 13:02 kfrydel

Not sure but this change appears to have broken our Python 3.7 runs. We had tests getting skipped and now ~none~ a portion of them are. Still investigating root cause.

Mar 10 '23 15:03 JRemitz

@JRemitz Please let me know if you have examples that should be skipped and are not.

Mar 10 '23 15:03 kfrydel

Definitely and I think I'm eating my words... we had a reduction in "skipped" tests, but the new ones were not previously skipped, they were two of the new rules that were added. I am able to correlate those, I just need to dig up why the skipped tests count dropped. I'll let you know if that has any relationship to this change. Apologies for the noise!

Mar 10 '23 15:03 JRemitz

bandit bandit copied to clipboard

Improve handling nosec for multi-line strings

bandit
bandit copied to clipboard