stdlib icon indicating copy to clipboard operation
stdlib copied to clipboard

Implement a `unique` function returning only the unique values in a vector.

Open loiseaujc opened this issue 1 year ago • 6 comments

Motivation

Recently, I've run into the problem of extracting unique values in a vector (of any integer, real or complex type or possibly even character). Consider for instance the following vector x = [1, 2, 3, 3, 4]. What I'd need is a function taking x as input and returning the vector y = [1, 2, 3, 4] as output. The interface for a real-valued vector could be as simple as

pure function unique(x, sorted) result(y)
     real(dp), intent(in) :: x(:)
     !! Array whose unique values need to be extracted.
     logical(lk), optional, intent(in) :: sorted
     !! Whether the output vector needs to be sorted or not (default .false. ?)
     real(dp), allocatable :: y(:)
     !! Vector containing only the unique values from x.
end function

The output vector could be sorted or not, depending on the user's choice. I know that there are no Fortran intrinsic functions for that purpose, but I ain't sure something like that is already available in stdlib. If I'm wrong, could anyone point me to the correct function?

Prior Art

  • In Matlab, there is the unique function whose description is available here.
  • Python has the set function taking as input a list and returning only the unique elements of this list.
  • Numpy has np.unique whose description is available here.
  • @jacobwilliams provides an integer-based implementation on his blog (here).

Additional Information

Both Matlab and Numpy's implementations cover a relatively large set of cases (1D-array, multidimensional arrays, different types, etc) and return values (the unique elements, the corresponding indices, indices to the reconstruct the original array from this unique set, etc).

I don't know if absolutely all these cases need to be covered (at least as a starting point). I would probably recommend to start with the simplest ones (i.e. only input vectors and output vector with the unique elements) as these are probably the most common situations where a unique function might be needed. That would include integer, real, complex and character 1D-arrays.

loiseaujc avatar Feb 24 '25 10:02 loiseaujc

I'm not sure either into which module this utility function should be included. Maybe stdlib_sorting?

loiseaujc avatar Feb 24 '25 10:02 loiseaujc

Good idea @loiseaujc, please note there is an open discussion at #670, should we merge this issue with that one?

perazz avatar Feb 24 '25 16:02 perazz

Oh sure! I completely overlooked this issue.

loiseaujc avatar Feb 24 '25 19:02 loiseaujc

is this issue open to solve? would love to contribute

demoncoder-crypto avatar Mar 13 '25 23:03 demoncoder-crypto

Sure, it is still open! As @perazz mentioned, it is closely related to #670 and the two could probably be merged. I haven't worked on this one at all so far. I encourage you to get in touch with @Beliavsky. Maybe they've started to craft something on the side. I don't know.

loiseaujc avatar Mar 25 '25 08:03 loiseaujc

I am going to Implement a rough draft and if I am able to be on the right track Just let me know and we will figure it out from there.

demoncoder-crypto avatar Mar 25 '25 17:03 demoncoder-crypto