SmokeDetector icon indicating copy to clipboard operation
SmokeDetector copied to clipboard

Less fp's

Open ghost opened this issue 4 years ago • 5 comments

  • Excludes StackOverflow, Maths, Mathoverflow and Cross Validated from the "post is mostly images" reason (since StackOverflow can have <img> code counted and the other 3 have a lot of MathJax used)
  • Adds Cross Validated to the exclusion list for the "mostly punctuation marks in {}" reason (MathJax)

Statistics:

Excluding StackOverflow, Maths, Mathoverflow and Cross Validated from the "post is mostly images" reason, will result in:

  • 31 fewer fp's
  • 0 fewer tp's

The current accuracy of this reason is 17% (17) New accuracy: 40% (40)

Excluding Cross Validated from the "mostly punctuation marks in {}" reason, will result in:

  • 0 fewer tp's (all tp's caught by other reasons)
  • 30 fewer fp's

ghost avatar Aug 01 '20 10:08 ghost

Note: The failures are not because of my code but because of how the tests are set up.

Edit: Fixed now

ghost avatar Aug 01 '20 11:08 ghost

Less fp for the mostly-img reason would be great, but I don't think excluding sites is optimal approach. See https://github.com/Charcoal-SE/SmokeDetector/pull/4190

user12986714 avatar Aug 01 '20 13:08 user12986714

Less fp for the mostly-img reason would be great, but I don't think excluding sites is optimal approach. See #4190

MathJax would still get caught though

ghost avatar Aug 01 '20 16:08 ghost

Excludes StackOverflow, Maths, Mathoverflow and Cross Validated from the "post is mostly images" reason (since StackOverflow can have code counted and the other 3 have a lot of MathJax used)

Isn't there a stripcodeblocks option that would help on Stack Overflow? As for the math sites...as far as I can tell, MathJax doesn't render as images, it just embeds the text in a <span class="math-container"> (which is rendered by client-side JS).

NobodyNada avatar Aug 29 '20 23:08 NobodyNada

Excludes StackOverflow, Maths, Mathoverflow and Cross Validated from the "post is mostly images" reason (since StackOverflow can have code counted and the other 3 have a lot of MathJax used) Isn't there a stripcodeblocks option that would help on Stack Overflow? As for the math sites...as far as I can tell, MathJax doesn't render as images, it just embeds the text in a (which is rendered by client-side JS).

I have updated the PR and it has nothing to do with SO now, also the issue is MathJax posted as images and not actual MathJax.

ghost avatar Aug 29 '20 23:08 ghost