string-interner icon indicating copy to clipboard operation
string-interner copied to clipboard

Allow users to "lock" interners, reducing memory overhead.

Open bentheiii opened this issue 8 months ago • 4 comments

Hello, for our use case, we have an initial population of a StringInterner with values, and then there is a long period in which only symbol resolution is needed. To reduce memory usage during the second period, we'd like to be able to resolve interned symbols without the additional memory overhead for interning new strings.

I propose to add a new struct, which will only allow to resolve strings from a given backend

pub struct StringResolver<B>
{
    backend: B,
}

impl<B> StringResolver<B>
where
    B: Backend,
{
    pub fn resolve(&self, symbol: <B as Backend>::Symbol) -> Option<&str> {
      ...
    }

    pub unsafe fn resolve_unchecked(&self, symbol: <B as Backend>::Symbol) -> &str {
      ...
    }
}

We will also add a method to convert StringInterners to a StringResolver

impl<B, H> StringInterner<B, H>
{
    pub fn into_resolver(self)->StringResolver<B>{
      ....
   }
}

I'd be happy to make a PR for this if this is approved

bentheiii avatar Mar 19 '25 10:03 bentheiii

@bentheiii interesting proposal. Have you measured the actual difference in memory usage? Would be interesting to know the potential gains of such a PR.

Robbepop avatar Mar 19 '25 10:03 Robbepop

@Robbepop I haven't measured it yet (correct me if I'm wrong, but the only way to measure this is to implement and use my own fork in an example project, no?)

AFAICT, the memory saved would be at least (size of symbol + size of u64) * num of interned strings, since that's what the dedup mapping we could throw away stores. In our use case (with over five million strings to intern), this could account for 60MB using default symbols. That's excluding whatever overhead is needed by the map itself, which I couldn't find specs for online.

bentheiii avatar Mar 19 '25 11:03 bentheiii

@Robbepop I haven't measured it yet (correct me if I'm wrong, but the only way to measure this is to implement and use my own fork in an example project, no?)

That's correct. It would be great to have these measurements before we go in and add this feature.

Though, I think your estimations probably are within the ballpark of improvements.

Robbepop avatar Mar 19 '25 13:03 Robbepop

Hi @Robbepop, unfortunately, something came up, and I won't be able to actually perform the experiment in the near future (our org decided to go with a different solution, unrelated to interning)

I can still make the PR in my spare time, but I won't be able to provide a solid figure for the mem difference other than the lower bound estimate. I understand if this is a blocker.

bentheiii avatar Mar 20 '25 09:03 bentheiii