ietoolkit icon indicating copy to clipboard operation
ietoolkit copied to clipboard

[iedorep] Prototyping detection of commands/lines generating irreproducibility

Open luisesanmartin opened this issue 3 years ago • 0 comments

Objective

To detect in which line a dofile stops having the same partial result (intermediate dataset) and random number order when run two times.

Inputs and outputs

  • Input: a Stata dofile
  • Output: a message in the console that tells you

Outline of the idea

  • Take a dofile as input, run it and save the data signature and random number state after every line of code, run it again and compare by line that the data signature and random are the same. If they're not, it means that the result stops being reproducible in the first line where they diverge.
  • The intended use of this command is to be run after iesave detects that there are changes in the final result of the dofile with respect to the previous result.

Outline of possible tasks

  1. Input dofile modification:
  • [x] Add an empty line after every line of the dofile with code
  • [x] Obtain the data signature and random number state of the dataset in these new empty lines
  • [x] Save the line number, data signature, and random number state in a temporary data table
  1. Data signature and random number state comparison:
  • [x] In the second run of the dofile, compare the data signature and random number state with their corresponding values saved in the first run.
  • [ ] Stop if any of the numbers differ and show a message saying in which line the discrepancy was found

Decision tree

We track changes on:

  • Data signature (data changed)
  • RNG (random number generator state)
  • Sort RNG (data sorted state)

Test that:

  • [ ] If RNG advanced: flag, but optional
  • [ ] If RNG advanced to a different place as in the first run: error!
  • [ ] If Sort RNG advanced: flag, but optional
  • [ ] If the data signature is changing: error!
    • This is not tracking changes in sort order

luisesanmartin avatar Nov 01 '21 14:11 luisesanmartin