Qcodes icon indicating copy to clipboard operation
Qcodes copied to clipboard

[WIP] Feature / dataset merger

Open WilliamHPNielsen opened this issue 7 years ago • 6 comments

Changes proposed in this pull request:

  • Add a function to merge (very similar) datasets. The idea is that a measurement may have been broken into smaller pieces by instrumental necessity and there should be a way to glue the different pieces together.

Pending:

  • [ ] Tests, tests, tests
  • [x] Metadata? How should we capture the fact that this is somehow non-measured experimental data?
  • [x] fix mypy complaints :)

@QCoDeS/core

WilliamHPNielsen avatar Jul 31 '18 14:07 WilliamHPNielsen

This looks very nice and clean! questions:

  • Is merge really a good name? Shall we use something more mathematical like union?
  • What happens if the two datasets that are to be merged, have measured values at the same values of setpoints? For example, if dataset1 has {x: 1, y: 2, z: 10} while dataset2 has {x: 1, y: 2, z: 65}? I guess, this is not a big deal for the dataset itself, right? It's just the worry of the tools which work with data like plot_by_id, right?

astafan8 avatar Jul 31 '18 14:07 astafan8

  • The name: You are right that set theoretical names are more concise in terms of how they combine/unite the datasets. But currently merge will only work if the datasets are very similar, so whereas you might expect union to really just combine everything (and thus works for any two datasets), merge (potentially, if you git'ed a lot) offers the user the intuition that not everything can be merged. And the merge is a verb, which I like. I suppose unite would correspond to union? But perhaps it'd be good to already now think about all the combining functions we'd like to have and then name appropriately.

  • Setpoints: exactly. If you play around with the notebook, you can see plot_by_id going mental if there are overlapping regions and a grid is detected. We should look into that.

WilliamHPNielsen avatar Aug 01 '18 08:08 WilliamHPNielsen

Concerning metadata. I think that for now you can combine the metadata into a dictionary by run_id and/or run_name within a dedicated field, smth like:

{
    merged_runs: {
        25: {'snapshot': {...}},
        37: {'snapshot': {...}},
    },
    < other meta data of THIS dataset >
}

I think the metadata is going to be a "place for dumping stuff" anyway, hence it is just not easy to say how to perform "operations" on it. So, a simple solution like this is probably fine as long as minimal "separation"/"organization" measures are taken.

astafan8 avatar Aug 01 '18 08:08 astafan8

I like that metadata solution. Let me see if I can fix the three items today.

WilliamHPNielsen avatar Aug 01 '18 11:08 WilliamHPNielsen

Codecov Report

Merging #1214 into master will increase coverage by 0.01%. The diff coverage is 90%.

@@            Coverage Diff             @@
##           master    #1214      +/-   ##
==========================================
+ Coverage   80.38%   80.39%   +0.01%     
==========================================
  Files          49       49              
  Lines        6801     6811      +10     
==========================================
+ Hits         5467     5476       +9     
- Misses       1334     1335       +1

codecov[bot] avatar Aug 02 '18 08:08 codecov[bot]

This PR should wait for #1002 to be merged. That way it will be captured where the data came from.

WilliamHPNielsen avatar Aug 14 '18 12:08 WilliamHPNielsen