BAPCtools icon indicating copy to clipboard operation
BAPCtools copied to clipboard

rethink `.gitignore`

Open RagnarGrootKoerkamp opened this issue 1 year ago • 6 comments

Current situation:

Setting gitignore_generated: true in generators.yaml writes data/.gitignore when generating, which contains all generated cases, i.e.:

.gitignore
secret/1-min.*
secret/2-max.*
bad/empty.*

This works nicely, and allows having sample files directly in data/sample/*.

New situation:

With the new generator spec, having 'inline' source files directly in data/ is discouraged. It would much simpler and much more consistent to simply ignore all data directories in their entirety, and generate warnings for any files in data/ that are not generated.

A) When to ignore files?

  1. Always
  2. Only when gitignore_generated: true is set

My preference: A1, always, since that's what we always use in practice anyway, and then we can drop the gitignore_generated flag and simplify code since there are fewer different usages.

B) Some approaches for ignoring files:

  1. Let user put problems/*/data/ in toplevel .gitignore manually.
  2. Generate problem/.gitignore containing data/
  3. Generate problem/data/.gitignore containing *; using the file we currently also use.

My preference: B3, to have automation, but not mess around with a possible problem/.gitignore file that could already exist. We can claim ownership of problem/data/.gitignore without issues IMO.

C) Some approaches for warning about non-generated files:

  1. Always
  2. When gitignore_generate: true is set
  3. When problem/data/.gitignore contains *, or some specific BAPCtools-generated string/comment.

My preference: C1, again to simplify the implementation and possible use cases.

RagnarGrootKoerkamp avatar Sep 12 '23 17:09 RagnarGrootKoerkamp

B1. (I have no strong opinions. I’m a decoupler and put version control issues into a different conceptual box from “what a problem generation should worry about”. BAPCtools explicitly has the opposite mindset, it wants to do everything, from version control to slack integration to solution presentation. My opinion here shouldn’t count for anything.)

thorehusfeldt avatar Sep 12 '23 19:09 thorehusfeldt

A1. and remove the flag B3. we want to claim ownership of the whole data dir anyway? C1. maybe even always "delete" them (move them outside of data)

mzuenni avatar Sep 12 '23 21:09 mzuenni

(although not completely independent I think A, Band C each need their own answer ;)

RagnarGrootKoerkamp avatar Sep 12 '23 21:09 RagnarGrootKoerkamp

My preferences are the same as Ragnar and mzuenni: A1, B3, C1 :slightly_smiling_face:

EDIT: but perhaps it is indeed convenient to not ignore the samples (see below) :thinking:

mpsijm avatar Sep 13 '23 08:09 mpsijm

One things that we may want to decide on separately is whether to gitignore samples. It's kinda convenient to have them committed when building PDFs, so they don't have to be re-generated each time bt pdf is run somewhere. See also #309

RagnarGrootKoerkamp avatar Sep 19 '23 11:09 RagnarGrootKoerkamp

Ok so my thinking changed a bit here. I now prefer B1, that users put */data/secret (and data/invalid_inputs) in a contest-level .gitignore. That is more explicit than automatically generating data/.gitignore files that ignore themselves and hence never show up in git status in the first place and are 'completely hidden' to the user.

Note that I explicitly think that samples should not be ignored. Changes to them are important to be reviewed, and sample/1.ans files are regularly modified to e.g. have a sensible number of digits for floating point output.

This makes A void:

  • we can drop the gititnore_generated: true from generators.yaml
  • we can drop the code that generates these .gitignore files :)
  • the user explicitly acknowledges that data/secret only exists locally, making it clear that nothing there is permanent.

I would still say C1 with this:

  • we reserve the right to trash or completely delete anything in data/ at any point. (Note: all of data/, not just data/secret. You have git to revert important samples.).
  • Then we could also make generate do -f 'by default', instead of the current situation where I constantly do generate -f optionally followed by generate -f --clean.

So basically this moves us into the direction where we completely ignore what is in data/ before calling bt generate and effectively remove everything and regenerate it to be up-to-date with whatever generators.yaml dictates (just possibly implemented more efficiently, only updating things that actually need changing and such.)

RagnarGrootKoerkamp avatar Oct 31 '23 11:10 RagnarGrootKoerkamp

My current view is:

  • When we generate stuff we always want to gitignore those file i.e. A1. and remove the flag
  • B2. But if data/* or !data/secret/* is not in there we add it (you can comment if if you don't want to ignore it)
  • C1. we want to own that directory. Always warn and remove (move to /generators/removed/) files that are not generated

mzuenni avatar Feb 27 '24 10:02 mzuenni

I've set up the repository for the Freshmen Programming Contests with:

  • A1 (i.e. I removed the generate_gitignore flag, which defaults to False anyway)

  • B: I've added data/invalid_* and data/secret to .gitignore manually now (B1), but it would be my preference to automate adding these rules to some .gitignore, either:

    • during bt new_contest (see https://github.com/RagnarGrootKoerkamp/BAPCtools/blob/master/skel/contest/.gitignore) (basically B1, but automated); or
    • by generating a .gitignore file in data/ which excludes everything except the samples (see https://github.com/RagnarGrootKoerkamp/BAPCtools/issues/309#issuecomment-1787268603) (B3).

    I disagree with B2 (adding a .gitignore to every problem), because typically, you'd like an entire contest to use the same configuration, right?

Regarding C (warning about non-generated files), I agree with you guys that data/ should be completely owned by BAPCtools. However, I do not like the idea of moving removed files to /generators/removed/, if that means that old/renumbered test cases will end up in there every time you add new test cases halfway in generators.yaml...

mpsijm avatar Feb 27 '24 11:02 mpsijm

regarding B: I think it has to be a problem specific setting since i dont want that behaviour if i dont use generators?

mzuenni avatar Feb 27 '24 13:02 mzuenni

Interesting... I do think that a problem should either consistently use generators (giving BAPCtools full control over the data/ folder) or consistently not use generators (so BAPCtools should not touch anything in the data/ folder, and warn when .ans files do not contain a correct answer, but not overwrite them). Taking that into account, we could let BAPCtools create data/.gitignore if and only if the folder generators/ exists?

Specifically, this could be useful when running BAPCtools on problems that have not been created using BAPCtools, and therefore do not have a generators/ folder, or when e.g. running BAPCtools on Kattis-exported problem packages. For creating new problems, I cannot think of a scenario where a user of BAPCtools would not make use of generators, but please share if you think such a scenario does exist :slightly_smiling_face:

mpsijm avatar Feb 27 '24 14:02 mpsijm

yeno, we should only do something with a .gitignore if the generators.yaml exists and it should therefore be on problem level, but i would prefer to do this is in the more transparent way and write it in the .gitignore of the problem directory.

mzuenni avatar Feb 27 '24 14:02 mzuenni

But if we want this to be transparent, I'd rather set ignore-rules at the level of a contest, because checking in .gitignore files for all problems with (roughly) the same ignore-rules feels redundant to me :stuck_out_tongue: And even though this means we don't automate setting ignore-rules on the problem level based on the existence of generators.yaml, we could still use the existence of this file to decide whether BAPCtools is allowed to overwrite anything in data/.

mpsijm avatar Feb 27 '24 14:02 mpsijm

IMO there are only two options since contest level should not handle this problem specific setting:

  1. write a .gitignore in /data which ignores itself
  2. add something to the problem .gitignore

And i simply prefer 2. since self ignoring .gitignore files are weird but i can live with that ^^

mzuenni avatar Feb 27 '24 14:02 mzuenni

Fixed by #348.

choices

  • A1: we always ignore files
  • B2: we add to problem/.gitignore
  • C: Instead of warning about unknown files, we move anything old/unknown to /tmp and log this see here.

RagnarGrootKoerkamp avatar Mar 05 '24 21:03 RagnarGrootKoerkamp