code-review Limit number of comments per file

When a file is invalid for an analyzer (minified JS code for example), the analyzer can produce a LOT of issues (120k issues for a single js file).

We should add a limit to the number of issues per file (say 20 issues max per lines, that would be 1000 issues for the example case), and discard all the issue to simply post one error:

Something is wrong about this file, the code review bot detected XXX issues for this file. Please check locally with this linter (example code).

There is a bugzilla issue tracking this too.

Feb 26 '19 08:02 La0

We could post a few issues, just to give an idea to the developer what's wrong. Probably something like 20 issues per file rather than per line, and then we say "we found XXX other issues, but we didn't post them".

Feb 26 '19 09:02 marco-c

In the case of a minified file, the issues themselves do not really help as they make bascically no sense. i think reporting a single comment on the first line of the file is fine here

Feb 26 '19 09:02 La0

If we can detect it's a minified file, yes, but I was thinking of that as a more general solution. E.g. say somebody adds a third-party file with a lot of issues and forgets to add it to the exclude list.

Feb 26 '19 09:02 marco-c

Right, we could also suggest a few options to solve the issue

is this a minified file ?
is this a third party file ?
...?

Feb 26 '19 09:02 La0

Could you explain what needs to be done here?

Feb 26 '19 09:02 Utkarsh1308

Hello @Utkarsh1308,

This issue concerns Mozilla Code Review bot, developped in this repository. We need to publish a new comment on Phabricator (the code review platform we use) when a large number of issues per file is detected by the bot.

The patch would add some code here to do the following:

Count publishable issues by file,
If one of these counts is above a threshold (let's say 20 per file), we should mark these issues as non-publishable
... and add a new issue linked to the the file explaining that we found too many issues to report.

You would then produce a Pull request on this repository with this code, and a unit test covering the added code.

You can find setup instructions for the project here.

To test your changes locally you can do ./please shell staticanalysis/bot then run pytest in the new shell to run unit tests

Feb 26 '19 09:02 La0

This is a bit more complicated than a good-first-bug, i moved it to good-next-bug.

Feb 26 '19 10:02 La0

hi @La0 . I am interested in this bug. Can i work on this.

Mar 01 '19 04:03 rv404674

@rv404674 Totally Rahul, it would be great for a contributor like you.

Mar 01 '19 10:03 La0

@Lao. Sure. Let me start hacking on this. One more thing, if you work with andi, can you tell him that i am interested in gsoc project "Test automation for linting tools" and that can he provide me some more description about project in depth on irc.

Mar 01 '19 11:03 rv404674

Ok @rv404674 , i'll assign it to you then.

Edit: i sent you an invite to become a collaborator on this repo

Mar 01 '19 11:03 La0

@La0 . I have accepted the invitation. :)

Mar 01 '19 13:03 rv404674

Now you're assigned !

Mar 01 '19 13:03 La0

Thanks :)

Mar 01 '19 14:03 rv404674

Hi @La0 . I don't think that i will be able to continue further. So can you please make this issue open for everyone. You guys were really nice,thanks a lot for all the help.

Apr 25 '19 22:04 rv404674

Thanks Rahul. This issue is now open again

Apr 26 '19 07:04 La0

Hi, I am interested to work on this issue. What's the next step?

May 01 '19 13:05 navsie

Hello @navsie , please look into the issues with label good-first-bug for a first contribution.

May 02 '19 08:05 La0

@La0 I understand what needs to be done in the issue but the link you attached here The patch would add some code here to do the following: is not valid could you send me a link to the file and I would be more than happy to take look and send in a PR. Thanks

May 11 '19 16:05 sladyn98

@La0 @marco-c Currently the review bot counts issues by class in this manner:

groups = itertools.groupby(
            sorted(issues, key=lambda x: str(x.__class__)),
            lambda x: x.__class__,
        )

Would we want to group them by files and then count their total or this would be fine as it is?

May 18 '19 18:05 sladyn98

Hi, I'm interested in working on this issue! The comment here contains a link to the codebase that is broken and makes it unclear where a contribution would be necessary - can I get some clarity on this? Thanks!

Feb 26 '20 00:02 benjaminfawcett

Sure @benjaminfawcett , you can work on this. Sorry about the bad links, the project has moved since this comment was written.

To clarify:

the starting point would be here to count the number of issues by task, and do the process described here
Setup docs for dev is here

Feb 26 '20 07:02 La0

code-review code-review copied to clipboard

Limit number of comments per file

code-review
code-review copied to clipboard