[WIP] Feature / dataset merger
Changes proposed in this pull request:
- Add a function to merge (very similar) datasets. The idea is that a measurement may have been broken into smaller pieces by instrumental necessity and there should be a way to glue the different pieces together.
Pending:
- [ ] Tests, tests, tests
- [x] Metadata? How should we capture the fact that this is somehow non-measured experimental data?
- [x] fix mypy complaints :)
@QCoDeS/core
This looks very nice and clean! questions:
- Is
mergereally a good name? Shall we use something more mathematical likeunion? - What happens if the two datasets that are to be merged, have measured values at the same values of setpoints? For example, if
dataset1has{x: 1, y: 2, z: 10}whiledataset2has{x: 1, y: 2, z: 65}? I guess, this is not a big deal for the dataset itself, right? It's just the worry of the tools which work with data likeplot_by_id, right?
-
The name: You are right that set theoretical names are more concise in terms of how they combine/unite the datasets. But currently
mergewill only work if the datasets are very similar, so whereas you might expectunionto really just combine everything (and thus works for any two datasets),merge(potentially, if you git'ed a lot) offers the user the intuition that not everything can be merged. And themergeis a verb, which I like. I supposeunitewould correspond tounion? But perhaps it'd be good to already now think about all the combining functions we'd like to have and then name appropriately. -
Setpoints: exactly. If you play around with the notebook, you can see
plot_by_idgoing mental if there are overlapping regions and a grid is detected. We should look into that.
Concerning metadata. I think that for now you can combine the metadata into a dictionary by run_id and/or run_name within a dedicated field, smth like:
{
merged_runs: {
25: {'snapshot': {...}},
37: {'snapshot': {...}},
},
< other meta data of THIS dataset >
}
I think the metadata is going to be a "place for dumping stuff" anyway, hence it is just not easy to say how to perform "operations" on it. So, a simple solution like this is probably fine as long as minimal "separation"/"organization" measures are taken.
I like that metadata solution. Let me see if I can fix the three items today.
Codecov Report
Merging #1214 into master will increase coverage by
0.01%. The diff coverage is90%.
@@ Coverage Diff @@
## master #1214 +/- ##
==========================================
+ Coverage 80.38% 80.39% +0.01%
==========================================
Files 49 49
Lines 6801 6811 +10
==========================================
+ Hits 5467 5476 +9
- Misses 1334 1335 +1
This PR should wait for #1002 to be merged. That way it will be captured where the data came from.