ietoolkit icon indicating copy to clipboard operation
ietoolkit copied to clipboard

Possible command to help with code review

Open luizaandrade opened this issue 6 years ago • 3 comments

When thing we noticed in our code review pilot is that deleting any outputs, including intermediate data sets, is necessary to make sure the code is replicable from the master do-file.

For example, it's possible the master do-file is not creating a data set that will be used as an input later, but if a previous version of this data set is already saved, Stata will not throw an error when it is open. However, when the code reviewer tries to run it, it will.

Maybe it would be interesting to create a function that removed the files from specific folders, so that using this function in the beginning of the master do file would be a good test for replicability of the code.

luizaandrade avatar Nov 15 '17 18:11 luizaandrade

good idea. This command can be built to either be specified manually, ie. listing the folders, or by assuming iefolder structure.

If it is manually it can be delete("intermediate data","final data") or it can be notdelete("raw data"). And there can be a file type option like filetype(".dta") that deletes only .dta files. And it can be notfiletype(.csv) as well.

If iefolder is specified instead of delete() or notdelete() we can assume that it is all files in the DataSets and Outputs folders and their subfolders that should be deleted.

I would use iegitaddmd as a base for this command as it set up to go over folders and sub-folders.

Let me know what you think.

kbjarkefur avatar Nov 16 '17 09:11 kbjarkefur

Now that I'm actually getting to it, I'm thinking instead of deleting some files, the command should create a replication copy of the original folder, including (i) the folder structure, (ii) the scripts, and (iii) the raw data. This way, we'd still have the original files and outputs and can compare them to whatever is created when running the master again. Everything else would be as you described.

luizaandrade avatar Aug 15 '18 18:08 luizaandrade

This is a great idea. It could also effortlessly check that whatever is generated is identical to what is originally submitted using the hash function provided at: https://www.statalist.org/forums/forum/general-stata-discussion/general/1443922-computing-hash-of-string-data-or-files

bbdaniels avatar Aug 15 '18 18:08 bbdaniels