taucmdr icon indicating copy to clipboard operation
taucmdr copied to clipboard

`tau trial renumber` can delete your trials

Open zbeekman opened this issue 7 years ago • 10 comments

All my trials were just deleted when using tau trial renumber!

Fortunately I had backups but this is the type of thing that could REALLY ruin someones day.

I'm not sure if this is a side effect of the changes made to speed up TinyDB transactions with the caching middleware.

$ tau trial list
== Trial Configurations (/p/home/zbeekman/Sandbox/AFSI/.tau/project.json) ============================================================================================================================================================================================================================

+--------+-----------+-------------------------------------------------------------------------+-----------------------------+-----------+-----------------+
| Number | Data Size |                                 Command                                 |         Description         |  Status   | Elapsed Seconds |
+========+===========+=========================================================================+=============================+===========+=================+
|   0    |  27.0KiB  |   aprun -n 1 -N 1 -d 1 -j 0 --cc=depth ./bin/TEST_AFSI_DPLR 64 64 64    |   1 of 1, quad-LLC-small    | completed |     23.112      |
+--------+-----------+-------------------------------------------------------------------------+-----------------------------+-----------+-----------------+
|   1    |  27.1KiB  |  aprun -n 1 -N 1 -d 1 -j 0 --cc=depth ./bin/TEST_AFSI_DPLR 100 100 100  |   1 of 1, quad-LLC-mdium    | completed |     79.735      |
+--------+-----------+-------------------------------------------------------------------------+-----------------------------+-----------+-----------------+

...

$ tau trial renumber {0..86} --to {15..86} {0..14}
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '--to', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83', '84', '85', '86', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14']

$ tau trial list
No trials.

zbeekman avatar May 24 '18 18:05 zbeekman

If you restore from the backup and then run the same command, does the same thing happen again?

nchaimov avatar May 24 '18 18:05 nchaimov

You mean does it happen deterministically? Or do you want me to revert taucmdr to before the caching changes happened?

zbeekman avatar May 24 '18 18:05 zbeekman

Yeah, I wanted to know if it will happen deterministically given what's in your project.json

nchaimov avatar May 24 '18 18:05 nchaimov

OK. I clobbered my project.json when restoring from the backup, but I'll test it again.

zbeekman avatar May 24 '18 18:05 zbeekman

to be fair I think I used the command wrong... the from and to arguments are transposed... trying again to see what happens

zbeekman avatar May 24 '18 18:05 zbeekman

yes it happens again, even with the backed up project.json file.

zbeekman avatar May 24 '18 18:05 zbeekman

There's definitely something wrong, and reverting the caching changes doesn't fix it. I can even get renumbering to delete trials in different experiments than the selected one.

nchaimov avatar May 24 '18 22:05 nchaimov

Has renumbering ever worked? There seem to be several very bad problems with it.

The renumbering process does this:

  • Gets a trial with the old number
  • Saves an in-memory copy of the trial
  • Change the number in the in-memory copy
  • Delete trials with the old number
  • Create a new trial based on the in-memory copy

Both the search and delete are not conditioned on what experiment is selected. So a copy will be made, not of the trial in the current experiment with the desired number, but of whatever the first one in the database is. It will then delete ALL trials with that number, and then recreate a copy with the new number of whichever trial it found.

I've fixed that, but that still doesn't make renumbering work, because the in-memory copy is a copy of the database entry only. Deleting the trial still deletes the profiles, which aren't part of the in-memory copy, and so don't get re-created when the renumbered copy is created

nchaimov avatar May 24 '18 23:05 nchaimov

@zbeekman I've changed the implementation of renumber to use queries specific to experiments and to no longer delete and recreate trials, but instead update them to temporary numbers and then to the new numbers. This should fix this bug as well as #278. I've noticed that renumbering many trials is very slow, however. I'll make a new bug to track that.

Can you check if this fixes the problem for you?

nchaimov avatar May 25 '18 21:05 nchaimov

Hi Nick, Thanks for looking at this. I have some NRC work to catch up on this morning, but will give this a test once I circle back around to the TAU related stuff. Thanks for the prompt response!

zbeekman avatar May 29 '18 15:05 zbeekman