mir-algorithm icon indicating copy to clipboard operation
mir-algorithm copied to clipboard

labels support

Open 9il opened this issue 3 years ago • 9 comments

Minimal support:

  • [ ] opIndex that returns slices, should also attach labels
  • [ ] indexed patch for labels
  • [ ] opIndex that returns indexed slices, should also attach labels
  • [ ] API to access labels
  • [ ] API to attach and detach labels
  • [ ] API to sort ndslices across labels
  • [ ] API to access ndslices sorted across labels such as atLabel!d, lowerBound!d, upperBound!d, transactionIndex!d
  • [ ] check that base mutation API like popFront works
  • [ ] Common API like opIndex access to elements should work the same way like for common slices without labels
  • [ ] Port tests from Series

Advanced support:

  • [ ] Adopt mir.ndslice.dynamic
  • [ ] Adopt mir.ndslice.topology
  • [ ] Adopt mir.algorithm.iteration
  • [ ] Adopt the rest of mir.ndslice.*

9il avatar Jul 08 '22 07:07 9il

Also note that functions that take a slice as input can't accept a slice with labels. I show using isMatrix below, but isConvertibleToSlice also works. So functions that don't have an overload for isConvertibleToSlice should if possible.

/+dub.sdl:
dependency "mir-algorithm" version="*"
+/

import mir.ndslice.allocation: slice;
import mir.ndslice.slice: Slice, SliceKind;
import mir.ndslice.traits: isMatrix;

void foo(SliceKind kind)(Slice!(double*, 2, kind) x) {}
void bar(M)(M x) if(isMatrix!M) {}

void main()
{
    auto dataframe = slice!(double, string)(4, 3);
    foo(dataframe); //does not compile
    bar(dataframe);
}

jmh530 avatar May 16 '23 17:05 jmh530

toSlice is another one of the functions (among many) that suffers from this problem. The problem is that relying on something like isSlice!S may result in a lot of template bloat. So what you would have to do is have some functions that constraint on isSlice!S and that just forward to some implementation function after stripping the labels.

One related problem is that there is a bug (#23924) related to how DMD handles overloads with template enum parameters combined with variadic templates that makes some of this more difficult than it should be.

Another solution that had been discussed in the past is to have DataFrame as a separate type that takes a slice as an alias this member.

import mir.ndslice.slice;

struct DataFrame(Iterator, size_t N, SliceKind kind, Labels...)
{
    Slice!(Iterator, N, kind) x;
    alias x this;
}

void foo(Iterator, size_t N, SliceKind kind)(Slice!(Iterator, N, kind) x) {}

void main()
{
    DataFrame!(double*, 1, Contiguous, string) x;
    foo(x);
}

jmh530 avatar May 18 '23 14:05 jmh530

Eh, maybe a better approach would be define something like

enum bool isSliceExact(T) = is(T == Slice!(Iterator, N, kind), Iterator, size_t N, SliceKind kind);

and then use it like

void foo(Iterator, size_t N, SliceKind kind)(Slice!(Iterator, N, kind) x) {}
void foo(T)(T x) if(isConvertibleToSlice!T && !isSliceExact!T) {x.toSlice.foo}

This should avoid the problem of matching both of them. It should work with slices that have labels as well.

jmh530 avatar May 19 '23 14:05 jmh530

Labels may be a breaking change. mir-algorithm and mir-ion have to try to avoid them until at least 2026.

9il avatar May 19 '23 16:05 9il

What happens in 2026?

jmh530 avatar May 19 '23 16:05 jmh530

The thing is that I am going to spend far less time with Mir. But still have to support it for a few years. On the other hand, we can add new code if that will not affect existing behavior.

9il avatar May 20 '23 11:05 9il

Thanks. I was just confused by the reference to 2026. Your previous comment made it seem like improved labels support might come, but not until at least after 2026. This most recent comment suggests the support is not likely to come at all.

If one of my PRs is adding too much new code or changes, just let me know and we can close. I've got an outstanding lubeck PR, but I'm thinking about just moving the functionality I need into mir-stat.

jmh530 avatar May 22 '23 12:05 jmh530

Actually, if the API doesn't change existing code in Lubeck, you can add new API ther.

9il avatar May 25 '23 12:05 9il

@9il See here. I don't introduce breaking changes, but I am making changes to existing code (the diff in some cases makes it look like there are more significant changes than there are).

My objective from these changes is to have multivariate normal densities in mir.stat. Having functions to handle quadratic forms and Mahalanobis distance is useful in this, and improving the handling of matrix multiplication makes writing those functions simpler. It doesn't really matter to me where this stuff is located.

jmh530 avatar May 25 '23 13:05 jmh530