tabmat icon indicating copy to clipboard operation
tabmat copied to clipboard

`_split_col_subsets` ignores columns when non-monotonic

Open MarcAntoineSchmidtQC opened this issue 4 years ago • 5 comments

Maybe this is not the right function for what I am looking for. I expected this function to be perfect for the __getitem__ column subsetting but it's not working if the columns are not monotonic. PR #91 depends on fixing this issue.

Minimal working example:

import quantcore.matrix as mx
import numpy as np

a = mx.DenseMatrix(np.arange(16, dtype=float).reshape(4, 4))
sm = mx.SplitMatrix(matrices=[a])

sm._split_col_subsets([0, 2, 3])
# Correct - Out[6]: ([array([0, 1, 2], dtype=int32)], [array([0, 2, 3], dtype=int32)], 3)

ms._split_col_subsets([0, 3, 2])
# Incorrect - Out[9]: ([array([0, 1], dtype=int32)], [array([0, 3], dtype=int32)], 3)
# Expected: ([array([0, 1, 2], dtype=int32)], [array([0, 3, 2], dtype=int32)], 3)

MarcAntoineSchmidtQC avatar Jul 22 '21 00:07 MarcAntoineSchmidtQC

@tbenthompson, if you could look into this I would like to get your opinion on whether we should spend the time fixing the _split_col_subsets method or write a new method for the column subsetting.

MarcAntoineSchmidtQC avatar Jul 22 '21 01:07 MarcAntoineSchmidtQC

Yes, you're correct that _split_col_subsets currently assumes the input columns are ordered monotonically. One thing we could do is sort the columns first, then call _split_col_subsets and then "unsort" the outputs. That would probably be the easiest thing to do. In order to minimize the amount of Cython, it might be nice to implement that sort/unsort step in a python wrapper function rather than directly in the Cython code.

I could help with this tomorrow or soon. Let me know if that would be helpful.

tbenthompson avatar Jul 22 '21 02:07 tbenthompson

I actually ran into this same issue from the other "end" recently when I fixed some bugs with subsets that were not monotonically specified. So this same bug is a problem if either:

  1. The cols to choose aren't monotonic. Or
  2. The subset_cols_indices aren't monotonic.

I fixed the subset_cols_indices problem by correcting the function that was creating the SplitMatrix and then requiring that input indices be monotonic.

Sorry about not realizing at the time that the cols problem would also be an issue!

tbenthompson avatar Jul 22 '21 02:07 tbenthompson

My PR currently contains a fix.

My solution was to create a column_map attribute to the SplitMatrix object. It stores the mapping from index location to actual location in the matrices. Then, creating a _split_col_subsets-like function that supports unordered columns and duplicate column was much more easy. I also modified the getcol method to use this.

We can discuss this in the coming days. Downside to this is that we need to store an array of size n_cols x 2, but I think it's worth it.

MarcAntoineSchmidtQC avatar Jul 22 '21 04:07 MarcAntoineSchmidtQC

@MarcAntoineSchmidtQC did this get fixed or did it get dropped when you closed your PR? Is it worth reviving just this fix?

tbenthompson avatar Oct 06 '21 21:10 tbenthompson