taucmdr
taucmdr copied to clipboard
`tau trial renumber` can delete your trials
All my trials were just deleted when using tau trial renumber!
Fortunately I had backups but this is the type of thing that could REALLY ruin someones day.
I'm not sure if this is a side effect of the changes made to speed up TinyDB transactions with the caching middleware.
$ tau trial list
== Trial Configurations (/p/home/zbeekman/Sandbox/AFSI/.tau/project.json) ============================================================================================================================================================================================================================
+--------+-----------+-------------------------------------------------------------------------+-----------------------------+-----------+-----------------+
| Number | Data Size | Command | Description | Status | Elapsed Seconds |
+========+===========+=========================================================================+=============================+===========+=================+
| 0 | 27.0KiB | aprun -n 1 -N 1 -d 1 -j 0 --cc=depth ./bin/TEST_AFSI_DPLR 64 64 64 | 1 of 1, quad-LLC-small | completed | 23.112 |
+--------+-----------+-------------------------------------------------------------------------+-----------------------------+-----------+-----------------+
| 1 | 27.1KiB | aprun -n 1 -N 1 -d 1 -j 0 --cc=depth ./bin/TEST_AFSI_DPLR 100 100 100 | 1 of 1, quad-LLC-mdium | completed | 79.735 |
+--------+-----------+-------------------------------------------------------------------------+-----------------------------+-----------+-----------------+
...
$ tau trial renumber {0..86} --to {15..86} {0..14}
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '--to', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14']
$ tau trial list
No trials.
If you restore from the backup and then run the same command, does the same thing happen again?
You mean does it happen deterministically? Or do you want me to revert taucmdr to before the caching changes happened?
Yeah, I wanted to know if it will happen deterministically given what's in your project.json
OK. I clobbered my project.json when restoring from the backup, but I'll test it again.
to be fair I think I used the command wrong... the from and to arguments are transposed... trying again to see what happens
yes it happens again, even with the backed up project.json file.
There's definitely something wrong, and reverting the caching changes doesn't fix it. I can even get renumbering to delete trials in different experiments than the selected one.
Has renumbering ever worked? There seem to be several very bad problems with it.
The renumbering process does this:
- Gets a trial with the old number
- Saves an in-memory copy of the trial
- Change the number in the in-memory copy
- Delete trials with the old number
- Create a new trial based on the in-memory copy
Both the search and delete are not conditioned on what experiment is selected. So a copy will be made, not of the trial in the current experiment with the desired number, but of whatever the first one in the database is. It will then delete ALL trials with that number, and then recreate a copy with the new number of whichever trial it found.
I've fixed that, but that still doesn't make renumbering work, because the in-memory copy is a copy of the database entry only. Deleting the trial still deletes the profiles, which aren't part of the in-memory copy, and so don't get re-created when the renumbered copy is created
@zbeekman I've changed the implementation of renumber to use queries specific to experiments and to no longer delete and recreate trials, but instead update them to temporary numbers and then to the new numbers. This should fix this bug as well as #278. I've noticed that renumbering many trials is very slow, however. I'll make a new bug to track that.
Can you check if this fixes the problem for you?
Hi Nick, Thanks for looking at this. I have some NRC work to catch up on this morning, but will give this a test once I circle back around to the TAU related stuff. Thanks for the prompt response!