slideflow icon indicating copy to clipboard operation
slideflow copied to clipboard

Added brute force method to preserved site k-fold split

Open rsethi21 opened this issue 2 years ago • 2 comments

rsethi21 avatar Jun 20 '22 22:06 rsethi21

In a083554, I add a unit test that ensures the generated splits are valid (all patients are used, no site is present in multiple cross-folds). The test generated data was copied from your original submission.

Running the test only requires:

python3 crossfolds_test.py

jamesdolezal avatar Jun 23 '22 05:06 jamesdolezal

In 57ac792, I made formatting changes with minor refactoring only to improve readability - the underlying algorithm is the same. Code that is easier to read is also easier to refactor and optimize. The kinds of changes I made include:

  • Added typing and docstring to the function declaration, to make it clear what the input arguments are and what the function does.
  • Broke long code blocks into smaller discrete sections, with accompanying comments to explain what each section does.
  • More succinct and easily interpretable variable names
  • Line length of 80
  • List comprehension to reduce the number of nested loops

With the unit testing now available, we can ensure that the splits that are being generated are valid. Running the unit test both pre- and post-refactor raises an error, indicating an issue with the algorithm.

For this next step, I'll have you familiarize yourself with the modified code, and track down the cause of the failed unit test. Once the unit test passes indicating that the algorithm is complete, we will move on to optimization.

jamesdolezal avatar Jun 23 '22 05:06 jamesdolezal