dimod icon indicating copy to clipboard operation
dimod copied to clipboard

Concatenate and merge info, preserving conflicts as lists.

Open joseppinilla opened this issue 4 years ago • 8 comments

If there is interest in merging info when concatenating samplesets since right now all info is ignored. This preserves conflicts by listing them, but squeezes unique values.

joseppinilla avatar Aug 25 '20 19:08 joseppinilla

Codecov Report

Merging #691 into master will increase coverage by 0.01%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #691      +/-   ##
==========================================
+ Coverage   91.85%   91.86%   +0.01%     
==========================================
  Files          60       60              
  Lines        4222     4229       +7     
==========================================
+ Hits         3878     3885       +7     
  Misses        344      344              
Impacted Files Coverage Δ
dimod/sampleset.py 94.32% <100.00%> (+0.09%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2ec6944...370123d. Read the comment docs.

codecov-commenter avatar Aug 25 '20 19:08 codecov-commenter

Hmm, I am worried that the conflict resolution would lead to inconsistent/confusing results. I wonder if we could combine it with https://github.com/dwavesystems/dimod/issues/643 by always making a list, even when the values are identical?

arcondello avatar Aug 27 '20 17:08 arcondello

I was considering one option flag, e.g. squeeze_info=False which would either behave like "list" by default or "squeeze" if True.

I see now how "drop" would allow compatibility with the current implementation, but I'm not sure if adding more flexibility is necessary.

joseppinilla avatar Aug 27 '20 19:08 joseppinilla

I prefer categorical option to boolean because it allows future expansion. Accepting a callable (in addition) is trivial and provides an absolute flexibility.

randomir avatar Aug 27 '20 19:08 randomir

One reason I didn't originally implement "list" is because I had to decide whether lists are always len(samplesets), e.g filled with None, or if they only list values of existing keys.

I think I'd go for the latter, even though it wouldn't allow tracking.

joseppinilla avatar Aug 27 '20 20:08 joseppinilla

IMO, "list" should include fields from all samplesets. We might consider splitting "list" into "list-all" and "list-existing", although "squeeze" is conceptually already very similar to "list-existing". @arcondello, thoughts?

randomir avatar Aug 27 '20 20:08 randomir

squeeze may not be the right name since it's not the same behaviour as numpy squeeze, it only lists conflicts... so... "list-conflicts"?

joseppinilla avatar Aug 27 '20 20:08 joseppinilla

I don't see how numpy.squeeze is relevant here, but I agree it's too general. unique sounds much better.

randomir avatar Aug 27 '20 20:08 randomir