JPlag icon indicating copy to clipboard operation
JPlag copied to clipboard

Generalize submission path specification

Open Alberth289346 opened this issue 2 years ago • 5 comments

The current input path specification of root/*/dir doesn't quite work for us, as we have a course/year directory tree, with several (but possibly not all) years to be included in the check.

Also, looking a bit ahead, we see the need for specifying multiple sets tomorrow, to add support for already checked sets, ie along the lines of #91 or #49 . As such, this is the first step towards a more unified and general way to specify submission sets than the current <root-dir>, -S sub, and -bc option triplet together.


The proposal for generalizing submission path specification is to allow having one or more file-system paths instead of the current single <root-dir> to specify a set submissions, with general * globbing (we'd use course/202* for checking the last 2 years). Several file system paths allow us to have course/202* course/2019 as set specification for checking the last 3 years.

As the form of the path isn't fixed, the currently accepted <root-dir>/*/sub path also fits in the proposal without needing the -S option.

An extension to the above could be curly brace expansion, the above would become course/20{2*,19}. This is basically another form of writing more than one file-system path, not sure it's worth the effort of implementing it.

-bc base-code path specification can also be changed to the more general form then (for globbing to reduce typing mainly). Not strictly needed at this point, as it's still one directory.


For tomorrow needs, a possible path seems to be (at this time):

  1. Allow more than one set submissions
  2. Allow some sets to be marked as "already checked"

Two new items then pop up.

  1. Each submission set may have a slightly different base code. I ignore the theoretical issue of how to deal with different base code for each set in the comparison. The other more practical issue is how to specify that comparison should not see the base code directory as a submission. (it makes sense to store it as course/year/basecode, but that's in the submission set path.)

    Having multiple -bc basecode options might work, except you get into trouble in selecting the version that should be used in the comparison.

    Another option is to specify one or more relative paths from the submission root. Makes it easier to attach a basecode to the correct submission set, but selecting the basecode to use in the comparison is still unsolved.

    Likely there are other options.

  2. Not all submission sets are equal any more, some are already checked, and don't need more checking. This means one has to say which of the above submission paths are done or not done yet. It needs at least two different path specifications, so one <root-dir> likely won't fly.

Thoughts?

@davbeek FYI

Alberth289346 avatar Oct 25 '21 09:10 Alberth289346

I like the idea, and it sounds like an excellent potential improvement to the current CLI. The goal of such an overhaul should be to grant the flexibility to deal with specific situations such as yours described above while preserving the ease of use for users who have a simple file/folder structure.

Supporting multiple basecodes might be tricky, as this requires changing a lot of the inner workings of JPlag. As such, I see that a major rework of JPlag that would have to be verified comprehensively to ensure continued functionality. Not that I think that it cannot be done, but it is definitely a significant task.

Regarding checking against prior submissions, I see that as a valuable feature even for the current version of JPlag. However, #91 or #49 are currently conflicting and cannot be merged without a rework. In the future, supporting multiple submission sets should be possible for the normal and the prior submissions.

In summary, I see the following suggested features:

  • allow multiple paths instead of a single root dir (leading to multiple submission sets)
  • allows globbing in those paths (one path can be multiple submission sets)
  • allow prior submissions (again with multiple paths and globbing)
  • allow multiple basecodes (e.g. one basecode per submission set)

EDIT: Random thought I had, when allowing multiple submission sets, submission name collisions can occur. That is an edge case that needs to be kept in mind. However, this can be easily worked around.

tsaglam avatar Oct 25 '21 12:10 tsaglam

@sebinside I am thinking maybe we can turn this into a PISE topic for the next summer term?

tsaglam avatar Oct 25 '21 12:10 tsaglam

Great issue, thank you! I would second that, let's make this a topic.

sebinside avatar Oct 25 '21 12:10 sebinside

For my information, what does "turn this into a PISE topic for the next summer term" aim to achieve?

Alberth289346 avatar Oct 25 '21 13:10 Alberth289346

That means we will make this issue (at least partly) into a student project for our practical course. Basically, masters students from KIT contribute to research/open source projects as part of their computer science studies

tsaglam avatar Oct 26 '21 06:10 tsaglam

Seems that we now support the suggestions made here, thus closed.

tsaglam avatar Feb 07 '23 13:02 tsaglam