rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Entry API equivalent for Sets

Open Gankra opened this issue 9 years ago • 14 comments

By merging RFC 1194, set recovery we have acknowledged that the values of keys "matter". That is, it's reasonable to have an equal key, but want to know about the details of the stored key.

That RFC added fn get(&T) -> Option<&T>, take(&T) -> Option<T>, and replace(T) -> Option<T>.

However, what if I have an entry-like situation?

Today, this is the best we can do:

fn get_or_insert(set: &mut HashSet<Key>, key: Key) -> &Key {
  let dupe = key.clone();
  if !set.contains(&key) {
    set.insert(key)
  }
  set.get(&dupe).unwrap();
}

Not only do we incur double-lookup (triple-lookup in the insertion case!), we also incur an unconditional Clone even though we already had a by-value key!

Optimally, we could write

fn get_or_insert(set: &mut HashSet<Key>, key: Key) -> &Key {
  set.entry(key).into_ref()
}

What's the entry API for sets? Well, a heck of a lot simpler. The entry API on maps is all about deferred value handling, and that doesn't make sense for sets.

  • Vacant::insert and Occupied::insert don't make sense because we already have the key
  • Occupied::get_mut and into_mut don't make sense because we don't acknowledge key mutation
  • Occupied::get and into_ref (to mirror into_mut), and remove are the only ones that make sense
  • It may also make sense to provide something like replace() to explicitly overwrite the old key... or something..?

So basically it would be something like entry(K) -> WasVacant(Entry) | WasOccupied(Entry). Critically, you get the same interface no matter what state the world was in, because there's nothing to do in the Vacant case but insert what was already given.

Supporting this would probably mean expanding the Entry API to "care about keys".

I haven't thought about the full implications here, and I don't have the bandwidth to write a full RFC at the moment.

Gankra avatar Feb 05 '16 17:02 Gankra

:+1: Needed this today.

ticki avatar Feb 05 '16 17:02 ticki

+1

apasel422 avatar Feb 05 '16 21:02 apasel422

+1 and thanks apasel422 for linking my PR and pointing me to this RFC! ;)

Robbepop avatar Feb 21 '16 04:02 Robbepop

Also needed this today, specifically:

let set: HashSet<String> = HashSet::new();
let ... = set.entry("the_key").or_insert(|| String::new("the_key"));

Centril avatar Dec 28 '16 09:12 Centril

I had a discussion about this on reddit today, and assumed that because replace is a thing, insert was meant to not replace an existing key. The current implementation, AFAICT, doesn't replace the key as expected. However given the "best we can do" scenario @Gankro wrote, I'm now unsure about this. Is key replacement in insert deliberately left unspecified? Or is there something else that I am missing that makes the "best we can do" code behave differently than the following:

fn get_or_insert(set: &mut HashSet<Key>, key: Key) -> &Key {
    let dupe = key.clone();
    set.insert(key);
    set.get(&dupe).unwrap()
}

jplatte avatar Mar 28 '17 16:03 jplatte

+1

izderadicka avatar May 11 '17 17:05 izderadicka

I needed this today. It's a shame Rust doesn't have this. Please add it to sets.

fschutt avatar Dec 16 '17 02:12 fschutt

+1, this would allow a safe zero-copy implementation of my makeuniq.rs script.

resilar avatar Mar 10 '18 08:03 resilar

For string interning, this is a very useful feature, and I hit into it today.

strega-nil avatar Sep 02 '18 19:09 strega-nil

Would definitely like to see this! I have a use case where even if the keys don't "matter", it's still useful to "insert a value and get a reference to either the existing value or inserted value". I'm working on an iterator adapter that filters out duplicates, and without an entry API, there's either an unnecessary lookup or an unnecessary clone:

struct Dedupe<I: Iterator>
    where I::Item: Eq + Hash + Clone {
    iter: I,
    seen: HashSet<I::Item>
}

impl<I: Iterator> Iterator for Dedupe<I>
    where I::Item: Eq + Hash + Clone {
    fn next(&mut self) -> Option<Self::Item> {
        loop {
            let item = self.iter.next()?;
            // Alternatively, do a contains() followed by insert()
            if self.seen.insert(item.clone()) {
                break Some(item);
            }
        }
    }
}

With the entry API, you could do:

fn next(&mut self) -> Option<Self::Item> {
    loop {
        if let WasVacant(item) = self.seen.entry(self.iter.next()?) {
            break Some(item.clone());   // Clone only on a cache miss
        }
    }
}

Essentially, there's a class of use case where you want to check if an T is present, insert it if not, then continue working with it as an &T without having to duplicate the lookup. This use case exists even if the "matteringness" of a particular key vs an equal key doesn't exist.

Lucretiel avatar Oct 12 '18 02:10 Lucretiel

+1

clayrab avatar Apr 25 '22 15:04 clayrab

This happened: https://github.com/rust-lang/hashbrown/pull/342

SUPERCILEX avatar Jun 25 '22 20:06 SUPERCILEX

Needed this today for BTreeSet, specifically to avoid a .clone() of the key.

For my specific case, the workaround is:

if !set.contains(key) {  // N.B., `key` here is a `Borrow<..>` ref.
    set.insert(key.clone());
}

but this is suboptimal.

amunra avatar Dec 29 '23 10:12 amunra

The regular Entry API doesn't help avoid that clone, because it always takes the key by value. You would need something more like HashMap::raw_entry_mut (rust-lang/rust#56167) or BTreeMap cursors (rust-lang/rust#107540).

cuviper avatar Dec 29 '23 19:12 cuviper