itertools icon indicating copy to clipboard operation
itertools copied to clipboard

Implement unique_combinations.

Open meltinglava opened this issue 6 years ago • 7 comments

Some questions that needs to be answared:

  • Document differently?
  • Any optimalisations?
  • The algorithm only requires equal Items to be grouped beside each other. This is achived by sorting. However if we there is an effisient way to group equals then we can have this require Eq insted of Ord

meltinglava avatar Aug 04 '19 15:08 meltinglava

I did a rebase so diffs should now make more sence to the current codebase.

There have happend a lot of changes to combinations since the initial release of this pr, so we should see if we can use some of the techniques applied to combinations.

As of the LazyBuffer type, i think if this should be merged with unique combinations, this is to major of a rewrite and think it should be a separate pr if so.

Thanks @phimuemue for reviewing. As for the last ones I have not marked as resolved. Since the rebase and all of the refer to combinations, can you check if they still apply.

meltinglava avatar Mar 05 '20 11:03 meltinglava

i have autoformat in my editorconfig on save. I can create an pr with formating for all the files.

meltinglava avatar Mar 08 '20 17:03 meltinglava

@phimuemue is there any other blockers than getting formating done on master for this PR (besides the unit testing)?

meltinglava avatar Mar 13 '20 14:03 meltinglava

@phimuemue is there any other blockers than getting formating done on master for this PR (besides the unit testing)?

As always, @jswrenn has the last word. But I still think that the implementation could/should be closer to Combinations::next.

I tried to see if my intuition is wrong, but taking your implementation and transforming piece by piece leads to something very similar to Combinations::next (see my sketch https://github.com/meltinglava/itertools/compare/master...phimuemue:uniq_comb2). (One thing I like about Combinations::next is that the early-outs are only for the None case, while your implementation has one for generate and one for None.)

phimuemue avatar Mar 14 '20 12:03 phimuemue

One thing I like about Combinations::next is that the early-outs are only for the None case, while your implementation has one for generate and one for None

I agree to the theory here, thow technicly the break statment here is just sugar for returning in that case. The loops job in all the cases are searching, hence break out / returning when found. Do not find the other one to be drasticly better than the other one.

The reason I am hessitent to do it that way, is that I do not feel like the code is better in terms of readability and logic. If I am not the only one that think this is true, then it is my opinion that combinations should be brought closer to whatever unique_combinations end up looking like if this is a goal that we want.

I have found that it takes quite alot of time to figgure out what an algorithm does when its all based on one mutable number that deals with all the cases.

Here are the pros and cons of each as i see it.

Pros unique:

  • No mut variables used.

Cons:

  • At the moment not the looking like combinations (atm becasue this can change in the future)

Neither one:

  • Same amount of early_return + break + continue statements (4 in bought cases)

Do anyone have anything to add?

meltinglava avatar Mar 14 '20 16:03 meltinglava

Thanks for your feedback.

One thing I like about Combinations::next is that the early-outs are only for the None case, while your implementation has one for generate and one for None

I agree to the theory here, thow technicly the break statment here is just sugar for returning in that case. The loops job in all the cases are searching, hence break out / returning when found. Do not find the other one to be drasticly better than the other one.

Regarding the break: If we follow my sketch we should get rid of the break here, by actually exploiting what we know: As stated in line 74

if self.pool[bump_source] < self.pool[bump_target] { // must be true for at least one bump_target

we know that the loop initiated in line 73 (for bump_target in bump_source + 1..pool_len) will not be exhausted without success. This is because line 63 (self.pool[self.indices[i]] == self.pool[i + pool_len - indices_len]) already established an element for whith the loop from line 73 will terminate. We could incorporate this information in the loop of line 73 to get rid of the break, so that it reads more like "find bump_target, increment indices(i) to it, and adjust the remaining indices".

The reason I am hessitent to do it that way, is that I do not feel like the code is better in terms of readability and logic.

You may have a valid point there: I may be biased (because I just compared your new code to the existing one), so another opinion on this could very well inform our choice there.

If I am not the only one that think this is true, then it is my opinion that combinations should be brought closer to whatever unique_combinations end up looking like if this is a goal that we want.

I agree: If UniqueCombinations is found to be the simpler variant, and Combinations would profit from this, we should change it.

I have found that it takes quite alot of time to figgure out what an algorithm does when its all based on one mutable number that deals with all the cases.

You're right: These algorithms are surprisingly tricky to formulate in simple terms. But isn't the essence as follows:

  • Starting at the end, find an index i in indices that can be incremented. (In Combinations, we have self.indices[i] == i + self.pool.len() - self.indices.len(), whereas in UniqueCombinations we could have essentially the same "tunneled through self.pool": self.pool[self.indices[i]] == self.pool[i + pool_len - indices_len]).

  • Once we have found this index, increment index(i). (In Combinations this is simply self.indices[i] += 1;, while in the UniqueCombinations we could derive a variant without breaks/early-outs that achieves something that increments self.indices(i) to something so that the referenced pool value changes.)

  • Increment the indices after i. For this I would expect the code to be the same in both variants.

I think current Combinations resembles this logic quite well. Do you have another (possibly simpler) view on the problem?

phimuemue avatar Mar 15 '20 14:03 phimuemue

Just thought how hard it would be doing this with rust iterators. It is qute descriptive if when comments are added. Here we have almost all of combinations, thow i could not get the .iter_mut to return mutable referances. Do also think what I did inside the for loop is wrong. thow this is just an example.

// search for bumpable index
return self
    .indices
    .iter()
    .rev()
    .enumerate()
    .find(|(offset, &index)| self.pool[index] != self.pool[pool_len - 1 - offset])
    .map(|(offset, &index)| (offset, index))
    .clone()
    // when found bump that index and increase the following values
    .map(|(offset, index)| {
        for (o, &mut i) in self
            .indices
            .iter_mut()
            .rev()
            .take(offset + 1)
            .rev()
            .enumerate()
        {
            *i = index + offset + o;
        }
    })

What i like about this version is that its only thefind function does exactly what it says, find the value that fits the predicate, and it does the break / early return handeling

meltinglava avatar Mar 16 '20 15:03 meltinglava