BuildingMOTIF icon indicating copy to clipboard operation
BuildingMOTIF copied to clipboard

Skip ipynb checkpoint directory

Open gtfierro opened this issue 1 year ago • 4 comments

This directory can contain duplicate files which have been cached when running a notebook. This can cause an issue when scanning the local file system for libraries and/or graphs, when BuildingMOTIF insists on enforcing uniqueness on certain tables. This patch prevents file in `.ipynb_chekcpoints' directories from being given to BuildingMOTIF.

@haneslinger @TShapinsky @MatthewSteen any ideas on how I could / should unit test this?

gtfierro avatar Jul 02 '24 23:07 gtfierro

Could you simply create an .ipynb_checkpoints/.gitkeep file (or similar), then do an assert in the test_utils.py?

MatthewSteen avatar Jul 03 '24 00:07 MatthewSteen

I'm not sure that would fix the issue, but maybe I'm misunderstanding how .gitkeep works. The problem is the notebooks will create the checkpoints folder regardless. One example workflow:

  • I run the Ingress CSV tutorial notebook from jupyter
  • while in jupyter, I visit the tutorial folder (used by the notebook) and make an adjustment to one of the files
  • I restart and re-run the CSV tutorial notebook. This will error because opening the tutorial folder within Jupyter created a checkpoints folder that now BuildingMOTIF is discovering. As a result, it finds two templates and/or libraries with the same name and throws a uniqueness constraint violation error

Before:

tutorial/
     templates.yml

after:

tutorial/
     .ipynb_checkpoints/
          templates.yml
     templates.yml

Doesn't .gitkeep keep the folder around? The above issue happens regardless of what git is doing

gtfierro avatar Jul 03 '24 14:07 gtfierro

Yes, .gitkeep keeps the folder around. I was just suggesting it as an example file for testing only. Not a file to actually create and track with Git.

Where is the uniqueness error coming from, sql db?

MatthewSteen avatar Jul 03 '24 15:07 MatthewSteen

🤦 that should have been obvious to me.... thanks! That's a good suggestion for tests

Yes, the uniqueness constraint comes from our schema. We want library names to be unique, and we want template names to be unique within each library.

gtfierro avatar Jul 03 '24 16:07 gtfierro

Finally added a test! I think this is ready to merge once the tests pass

gtfierro avatar Aug 27 '24 20:08 gtfierro