JPlag
JPlag copied to clipboard
Language module architecture
The Language module architecture still has several problems. This issue is meant to document all known problems. Successor of #454. Solving these problems may introduce API-breaking changes; thus, the solutions require a major release.
- [x] Language modules should provide readme files that explain the design rationales of the token extraction
- [x] An abstract parser class should offer shared functionality among frontends
- [ ] The parser should be instantiated per submissions, not globally per language
- [x] The language facades should not be named the same (API-breaking change!!!)
- [x] Language modules currently all read in files on their own; see #746
- [ ] #576
- [x] #920
@jplag/maintainer & @jplag/studdev: Feel free to comment if you have some points to add to this list.
Maybe language-independent token types should be considered. Language-specific token types lead to inconsistencies between language modules. Language-independent token types would also enable cross-language plagiarism detection, which is an emerging field of study. The long-term goal could be to make the language flag entirely optional and compare all submissions regardless of their language by default.
We could also add a module with common code for testing, as most languages implement the same stuff for their tests.
We could also add a module with common code for testing, as most languages implement the same stuff for their tests.
What about the language testutils module?
I totally missed that. I still think, it would be worth to expand it, to standardize the behavior of language module tests.
I totally missed that. I still think, it would be worth to expand it, to standardize the behavior of language module tests.
Everything that we can standardize or generalize, we should. This is why the testutils were created. If you have any ideas for an extended testing framework for language modules we can discuss that. Would be great to make things easy for new modules.
Right now all languages parse entire files, this makes parsing of files that contain multiple languages (such as .vue
files and .ipynb
) files very difficult, as we need to create temporary files.