JPlag icon indicating copy to clipboard operation
JPlag copied to clipboard

Language module architecture

Open tsaglam opened this issue 1 year ago • 7 comments

The Language module architecture still has several problems. This issue is meant to document all known problems. Successor of #454. Solving these problems may introduce API-breaking changes; thus, the solutions require a major release.

  • [x] Language modules should provide readme files that explain the design rationales of the token extraction
  • [x] An abstract parser class should offer shared functionality among frontends
  • [ ] The parser should be instantiated per submissions, not globally per language
  • [x] The language facades should not be named the same (API-breaking change!!!)
  • [x] Language modules currently all read in files on their own; see #746
  • [ ] #576
  • [x] #920

tsaglam avatar Mar 17 '23 11:03 tsaglam

@jplag/maintainer & @jplag/studdev: Feel free to comment if you have some points to add to this list.

tsaglam avatar Mar 17 '23 11:03 tsaglam

Maybe language-independent token types should be considered. Language-specific token types lead to inconsistencies between language modules. Language-independent token types would also enable cross-language plagiarism detection, which is an emerging field of study. The long-term goal could be to make the language flag entirely optional and compare all submissions regardless of their language by default.

brodmo avatar Apr 14 '23 10:04 brodmo

We could also add a module with common code for testing, as most languages implement the same stuff for their tests.

TwoOfTwelve avatar May 09 '23 08:05 TwoOfTwelve

We could also add a module with common code for testing, as most languages implement the same stuff for their tests.

What about the language testutils module?

dfuchss avatar May 09 '23 10:05 dfuchss

I totally missed that. I still think, it would be worth to expand it, to standardize the behavior of language module tests.

TwoOfTwelve avatar May 09 '23 11:05 TwoOfTwelve

I totally missed that. I still think, it would be worth to expand it, to standardize the behavior of language module tests.

Everything that we can standardize or generalize, we should. This is why the testutils were created. If you have any ideas for an extended testing framework for language modules we can discuss that. Would be great to make things easy for new modules.

tsaglam avatar May 10 '23 12:05 tsaglam

Right now all languages parse entire files, this makes parsing of files that contain multiple languages (such as .vue files and .ipynb) files very difficult, as we need to create temporary files.

libre-man avatar Oct 12 '23 10:10 libre-man