JPlag icon indicating copy to clipboard operation
JPlag copied to clipboard

Support top-level statements in C#

Open b0ink opened this issue 4 months ago • 3 comments

Bug Description

I've updated from v6.1.0 to v6.2.0 and the same set of files are now showing up as failed submissions for being "too small"

Is there a CLI option that controls this behaviour? It was working well previously and would like those files to be included

JPlag Version

6.2.0

Operating System

No response

Java Version

No response

b0ink avatar Aug 22 '25 01:08 b0ink

Which language module are you using?

Kr0nox avatar Aug 22 '25 07:08 Kr0nox

@Kr0nox C#

Looking over some of the failed submissions I'm now realising it's because they're all using top-level statements (no Main class).. the valid flagged similarities are all using classes. Going back and comparing 6.2.0 with 6.1.0 reports i think they were failing back then too.. the error reporting in v6.2.0 must have thrown me off

I understand that JPlag supports up to C# v6 which doesn't support top-level statements, but is this something that could be fixed from within JPlag or is this an ANTLR4 parser limitation?

Here's a test analysis using v6.2.0 on duplicate submissions using top-level statements

2025-08-25-10:35:36_945 [INFO] AntlrLoggerErrorListener - Summary of all errors:
2025-08-25-10:35:36_920 [ERROR] AntlrLoggerErrorListener - ANTLR error - in /Users/xxx/Desktop/jplag/1/toplevel.cs line 5:0 mismatched input 'string' expecting {<EOF>, 'abstract', 'async', 'class', 'delegate', 'enum', 'extern', 'interface', 'internal', 'namespace', 'new', 'override', 'partial', 'private', 'protected', 'public', 'readonly', 'ref', 'sealed', 'static', 'struct', 'unsafe', 'virtual', 'volatile', '['}
2025-08-25-10:35:36_920 [ERROR] AntlrLoggerErrorListener - ANTLR error - in /Users/xxx/Desktop/jplag/2/toplevel.cs line 5:0 mismatched input 'string' expecting {<EOF>, 'abstract', 'async', 'class', 'delegate', 'enum', 'extern', 'interface', 'internal', 'namespace', 'new', 'override', 'partial', 'private', 'protected', 'public', 'readonly', 'ref', 'sealed', 'static', 'struct', 'unsafe', 'virtual', 'volatile', '['}
2025-08-25-10:35:36_946 [INFO] Submission - Summary of all errors:
2025-08-25-10:35:36_929 [ERROR] Submission - Submission 2 contains 2 tokens, which is below the minimum match length 8!
2025-08-25-10:35:36_929 [ERROR] Submission - Submission 1 contains 2 tokens, which is below the minimum match length 8!

toplevel.cs

using static System.Convert;
using static SplashKitSDK.SplashKit;

// Variables
string examplevar1;
string examplevar2;
string name;
int duration;

Write("Hello world");

// ...

b0ink avatar Aug 25 '25 00:08 b0ink

Hi again, b0ink, thank you for this remark!

For parsing C# with ANTLR, we use the grammar from the official ANTLR4 grammar repo on GitHub, which apparently is still covering the same feature set as 8 years ago, based on the last update to the README. We also encounter similar problems with other programming languages.

It is possible to adapt the grammar manually, but then also other parts of the pipeline need to be adjusted. Also, the idea was to rely on the community-trusted, proven grammar files from the official ANTLR repo. Since those tend to be severely outdated, however, we are currently looking into alternative parsing libraries which we hope will keep up with language updates more closely.

robinmaisch avatar Aug 25 '25 09:08 robinmaisch