flux Add permutations and permutations

Add permutations and permutations_sized adaptors

Open nwad123 opened this issue 10 months ago • 3 comments

This PR adds the permutations and permutations_sized adaptors, as described in #138.

The permutations adaptor takes in any input sequence and produces an ouput of all possible permutations of the input. If the input sequence has a length of $l$, the output sequence is length $l!$. The output permutations are in lexographical order according to the original order of the input. If the original input is in order, the output permutations will also be in order.

auto x = flux::ints()         // 0, 1, 2, 3, ...
              .take(3)        // 0, 1, 2
              .permutations() // [0, 1, 2], [0, 2, 1],
                              // [1, 0, 2], [1, 2, 0],
                              // [2, 0, 1], [2, 1, 0]
              // ...

The permutations_sized adaptor differs in operation in that it is parameterized with a additional input -- SubsequenceSize -- that defines how long each output permutation should be. If the length of the input sequence is $l$ and the value of SubsequenceSize is represented by $r$ then the length of the output is calculated by:

$$ \frac{l!}{(l-r)!} $$

auto x = flux::ints()                  // 0, 1, 2, 3, ...
              .take(3)                 // 0, 1, 2
              .permutations_sized<2>() // [0, 1], [0, 2],
                                       // [1, 0], [1, 2],
                                       // [2, 0], [2, 1]
              // ...

The implementation of both adaptors is spread across three files:

permutations_base.cpp: Contains shared functionality between the two types of permutations adaptors.
permutations.cpp: The permutations adaptor.
permutations_sized.cpp: The sized permutations adaptor.

The implementations are based off the implementation of both Rust and Python's itertools permutations adaptors.

The operations of both adaptors is similar. First, they cache the input sequence so that the values can be repeatedly copied into each output element. Then, each cursor contains a vector, indices containing the permuted indices of the input vector. These indices are used along with the cached input sequence to generate the permutation at each element (see example below). When read_at is called, the cached input sequence is copied into the output vector using the indices stored in the cursor for each value. When a cursor is advanced, the next permutations of the indices is generated, and then when read_at is called again, the next permutation of the input is returned.

Here's an example demonstrating the steps. Note that this example is slight pseudocode in order to better show the steps that are happening.

// Sequence 
auto permuted_sequence = flux::ref({2, 1, 3}).permutations();

// Cursor 
auto c = flux::first(permuted_sequence);

assert(c.indices == [0, 1, 2]);
assert(permuted_sequence.cache == [2, 1, 3]);

// Read
auto x = flux::read_at(permuted_sequence, c);
         // index into the cached values using the cursor indices:
         // x[0] == permuted_sequence.cache[c.indices[0]];
         // x[1] == permuted_sequence.cache[c.indices[1]];
         // x[2] == permuted_sequence.cache[c.indices[2]];

assert(x == [2, 1, 3]);

// Advance
flux::inc(permuted_sequence, c);
      // c.indices = std::next_permutation(c.indices)

assert(c.indices = [0, 2, 1]);

// Read again 
auto y = flux::read_at(permuted_sequence, c);
         // y[0] == permuted_sequence.cache[c.indices[0]];
         // y[1] == permuted_sequence.cache[c.indices[1]];
         // y[2] == permuted_sequence.cache[c.indices[2]];

assert(y == [2, 3, 1]);

I'm marking this as a draft because I'm looking for feedback on several design decisions I made. Once these are answered I can finish up the documentation, testing, and implementation of these adaptors and hopefully have them merged in. Also, there are optimizations that could be implemented for different types of input sequences that I plan on adding over time. Here are the specific areas of feedback I'm looking for:

Does the decision to split the functionality of the permutations and permutations_sized into two adaptors make sense?
Are vectors a good element type for the output? I've thought about using an array if we know the size of the input at compile time, but I started with a vector because it's more flexible to use. Also, if we have long input sequences, I wasn't sure if it would be good to use a stack-based element type for the output.
Are the cursors too expensive to create to justify satisfying flux::regular_cursor for the cursors? The documentation says that "cursors may be copied within algorithm implementations, so copying should be 'cheap'" (flux::regular_cursor docs).

I'm also happy to hear any other feedback that you have! I'm still pretty new to contributing to C++ projects, but I'm enjoying what I'm learning. Also, December and January were busy with other tasks, but I should be able to contribute more time to these adaptors for the next several weeks.

laid out basics of permutation adaptor, still researching impl strategies
Renamed branch to follow other branch names
Removed unnecessary comments
Added permutations adaptor and started tests
Added string-based permutations comparison test
Fixed string based comparison test
Removed unused imports
Added sized permutations adaptor
Fixed sized template parameters on sized permutations
Added sized permutations test
Renamed 'sized_permutations' to 'permutations_sized'
Added outline for documentation
Renamed and refactored for first draft

Jan 26 '25 23:01 nwad123

flux flux copied to clipboard

Add permutations and permutations_sized adaptors

flux
flux copied to clipboard