zed icon indicating copy to clipboard operation
zed copied to clipboard

Unique lines (a.k.a. remove duplicates)

Open exalted opened this issue 2 years ago • 2 comments
trafficstars

Check for existing issues

  • [X] Completed

Describe the feature

I wish there was a way to keep only one copy of repeating text in a buffer’s all lines (or a selected portion of it).

There may be different operating modes. For example, in one mode;

Foo
Bar
Foo
Foo
Bar 

… may be converted into:

Foo
Bar

… (i.e., removing all repeating occurrences of each line). Note, however, if this was implemented as “first sort [internally], then remove dups” then one must be thoughtful how the sorting works, because, for example, if sorting was making decisions about “what comes first” (e.g., case-sensitive or insensitive, numbers first VS letters first, special characters, similar accented chars to behave as one «e.g., ôöòóœø», etc.), this would impact on the final output of this feature.

In another mode the same input above might become:

Foo
Bar
Foo
Bar 

… (i.e., only the last repeating Foos are removed, because other repeating occurrences weren’t continuous.)

And maybe even in a multi-cursor selection like this:

CleanShot 2023-04-06 at 10 32 25@2x

The end result, after the dups are removed, might look like this:

CleanShot 2023-04-06 at 10 32 53@2x

So the words wrapped around ~ in the screenshot below are removed from the initial input, because they’re appearing for the n-th time:

CleanShot 2023-04-06 at 10 33 12@2x

What do you think?

(P.S.: This may be a sub-feature of https://github.com/zed-industries/zed/issues/5438.)

If applicable, add mockups / screenshots to help present your vision of the feature

No response

exalted avatar Apr 06 '23 08:04 exalted

I really think this ought to be a builtin considering #5788 got added 🙏 Then I can finally stop my little getaway dance with sublime..

porsager avatar Jan 29 '24 21:01 porsager

This looks doable, see https://github.com/zed-industries/zed/pull/2786 for the reference. One needs to create another Editor::manipulate_lines wrapper and another action for that + add a test into editor_tests.rs

SomeoneToIgnore avatar Jan 29 '24 22:01 SomeoneToIgnore

One decision that we probably need to make here is whether or not this command should

  • remove all duplicates, or
  • remove only consecutive duplicates

The Vec::dedup method only works on consecutive duplicates. You get a bit more flexibility with the latter option, since you can always sort your lines first before deduplicating, but something tells me I'd never want to have the behavior of removing only consecutive duplicates.

JosephTLyons avatar Jan 30 '24 13:01 JosephTLyons

A single data point here, but I've never needed to only remove consecutive duplicates ;)

porsager avatar Jan 30 '24 13:01 porsager

I hear both of you, but if ever someone needed "remove only consecutive dups", only one of the two approaches would allow it, while for anyone who would like to "sort & dedup", well, they can do that 😉

My :two-cents: , I'd totally dig having multiple commands:

  • dedup all
  • dedup consecutive

Each enabling any combinations of the following modes:

  • case-aware
  • variant-aware

Also, I'd expect these to work with:

  • entire file, it nothing is selected
  • on single selection
  • on multiple selections
  • cross-selections

And last, but not least 😅, all this should also work on multi-file-buffers (e.g., project search results window.)

Easy, right?! 🙂

exalted avatar Jan 30 '24 13:01 exalted

I disagree with the tendency to overcomplicate and believe that a simple HashSet::contains-like, case-aware, deduplication is all we need — this is exactly what VSCode and Sublime provide, the original issue required for, and exactly what we should start here with.

The rest of the related wishes could go into separate issues/PRs and I suspect those would not be as popular.

SomeoneToIgnore avatar Jan 30 '24 13:01 SomeoneToIgnore

Can't argue with that 🙂

All the additional features, if any and if ever, can be a plug-in one day.

exalted avatar Jan 30 '24 13:01 exalted

I disagree with the tendency to overcomplicate and believe that a simple HashSet::contains-like, case-aware, deduplication is all we need — this is exactly what VSCode and Sublime provide, the original issue required for, and exactly what we should start here with.

The rest of the related wishes could go into separate issues/PRs and I suspect those would not be as popular.

Yeah, using a hashset seems like the right approach to me. :)

JosephTLyons avatar Jan 30 '24 14:01 JosephTLyons

Hi! Looking forward to make my first contribution! Is it ok to try this one ? I did manage to craft a solution that changes Editor::manipulate_lines to allow lines insertion or removal. Here's the approach I'm thinking about

    fn manipulate_lines<Fn>(&mut self, cx: &mut ViewContext<Self>, mut callback: Fn)
    where
        Fn: FnMut(&mut Vec<&str>),
        {
        ...
            // Mark start and end row for each selection. The edited buffer will be used to calculate offsets
            let start_row = start_point.row + added_lines - removed_lines;
            new_selections.push(Selection {
                id: selection.id,
                start: start_row,
                end: start_row + (lines.len() as u32) - 1,
                goal: SelectionGoal::None,
                reversed: selection.reversed,
            });

            if lines.len() > lines_len {
                added_lines += (lines.len() - lines_len) as u32
            } else {
                removed_lines += (lines_len - lines.len()) as u32;
            }
        }

        self.transact(cx, |this, cx| {
            let buffer = this.buffer.update(cx, |buffer, cx| {
                buffer.edit(edits, None, cx);
                buffer.snapshot(cx)
            });

            let new_selections = new_selections
                .iter()
                .map(|s| {
                    let start_point = Point::new(s.start, 0);
                    let end_point = Point::new(s.end, buffer.line_len(s.end));

                    let start_anchor = buffer.anchor_after(start_point);
                    let end_anchor = buffer.anchor_before(end_point);
                    Selection {
                        id: s.id,
                        start: start_anchor.to_offset(&buffer),
                        end: end_anchor.to_offset(&buffer),
                        goal: SelectionGoal::None,
                        reversed: s.reversed,
                    }
                })
                .collect();

            this.change_selections(Some(Autoscroll::fit()), cx, |s| {
                s.select(new_selections);
            });

This is how the unique action would look like:


    pub fn unique_lines_case_sensitive(
        &mut self,
        _: &UniqueLinesCaseSensitive,
        cx: &mut ViewContext<Self>,
    ) {
        self.manipulate_lines(cx, |lines| {
            let mut seen = HashSet::default();
            lines.retain(|line| seen.insert(*line));
        })
    }

dalton-oliveira avatar Feb 06 '24 16:02 dalton-oliveira

@dalton-oliveira welcome to submit a PR. I think any good-first-issue labeled issue that has an approach description that more or less matches yours can safely get a PR straight away, without the discussion.

But what you did in unique_lines_case_sensitive looks exactly what I've imagined the fix would be. Could not comprehend the diff for manipulate_lines but that does not look bad either — as long as the PR has a test for this, it's fine.

SomeoneToIgnore avatar Feb 07 '24 06:02 SomeoneToIgnore

@dalton-oliveira landed unique lines in this PR, which will be in preview next Wed. 🎉

  • https://github.com/zed-industries/zed/pull/7526

Let's close this issue out and open new ones for any extra line/text manipulation commands.

JosephTLyons avatar Feb 08 '24 18:02 JosephTLyons