Duplicate data can be inserted into the database if the measurement is interrupted by keyboard
Steps to reproduce
Running this snippet of code:
meas = Measurement(exp=exp)
meas.write_period = 0.5
meas.register_custom_parameter('x', unit='bogus', paramtype='array')
meas.register_custom_parameter('y', unit='more bogus', setpoints=['x', ], paramtype='array')
with meas.run() as datasaver:
for x in np.linspace(0, 10, 51):
y = np.cos(x)
datasaver.add_result(('y', np.array([y])), ('x', np.array([x])))
time.sleep(0.05)
at some point i interrupt by keyboard (stop button in jupyter notebook).
Expected behaviour
Looking at the resulting data, i'd expect that data up to some point has been inserted into the DB.
Actual behaviour
I find that in some fraction of the time (around 50%), the last inserted data is inserted twice. i.e., the data is corrupted in that case.
when i load the data from the DB:
run_id = 15
qc.dataset.data_export.get_data_by_id(15)
Output:
[[{'data': array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ,
0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ]),
'label': '',
'name': 'x',
'unit': 'bogus'},
{'data': array([ 1. , 0.98006658, 0.92106099, 0.82533561, 0.69670671,
0.54030231, 0.36235775, 0.16996714, -0.02919952, -0.22720209,
-0.41614684, 1. , 0.98006658, 0.92106099, 0.82533561,
0.69670671, 0.54030231, 0.36235775, 0.16996714, -0.02919952,
-0.22720209, -0.41614684]),
'label': '',
'name': 'y',
'unit': 'more bogus'}]]
This seems to happen when the interrupt happens inside the insertion (rather than say during sleep):
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-70-0613c75653e0> in <module>()
12 for x in np.linspace(0, 10, 51):
13 y = np.cos(x)
---> 14 datasaver.add_result(('y', np.array([y])), ('x', np.array([x])))
15 time.sleep(0.05)
d:\onedrive\bf2\code\qcodes\qcodes\dataset\measurements.py in add_result(self, *res_tuple)
218
219 if monotonic() - self._last_save_time > self.write_period:
--> 220 self.flush_data_to_database()
221 self._last_save_time = monotonic()
222
d:\onedrive\bf2\code\qcodes\qcodes\dataset\measurements.py in flush_data_to_database(self)
381 if self._results != []:
382 try:
--> 383 write_point = self._dataset.add_results(self._results)
384 log.debug(f'Successfully wrote from index {write_point}')
385 self._results = []
d:\onedrive\bf2\code\qcodes\qcodes\dataset\data_set.py in add_results(self, results)
473 len_before_add = length(self.conn, self.table_name)
474 insert_many_values(self.conn, self.table_name, list(expected_keys),
--> 475 values)
476 return len_before_add
477
d:\onedrive\bf2\code\qcodes\qcodes\dataset\sqlite_base.py in insert_many_values(conn, formatted_name, columns, values)
660 if ii == 0:
661 return_value = c.lastrowid
--> 662 start += chunk
663
664 return return_value
C:\ProgramData\Anaconda3\lib\contextlib.py in __exit__(self, type, value, traceback)
86 if type is None:
87 try:
---> 88 next(self.gen)
89 except StopIteration:
90 return False
d:\onedrive\bf2\code\qcodes\qcodes\dataset\sqlite_base.py in atomic(conn)
473 else:
474 if is_outmost:
--> 475 conn.commit()
476 finally:
477 if is_outmost:
KeyboardInterrupt:
System
Qcodes master (pulled this morning)
The direct reason is that
385 self._results = []
doesn't get called, so the results stay in the Measurement object's cache, which gets flushed once more as the context manager exits.
We have some ideas to solve this, but as a very quick-n-dirty fix that would potentially cause data loss (but prevent data duplication), you could execute self._results = [] in the except part of flush_data_to_database. Or copy the results, clear the cache and then try to write the copied results.