ustr icon indicating copy to clipboard operation
ustr copied to clipboard

Allow multiple names and associating static data per-string

Open JohnathanFL opened this issue 7 months ago • 0 comments

First, apologies for the guerilla PR. I've been using the library for some time while thinking it'd be nice to have per-string data, so I eventually just decided to implement it for fun first and see if it's acceptable to merge back in second.

This refactors the internals of the crate to allow for multiple independent caches (whose ustrs/tokens are distinct types), closing #30, that may additionally choose to store their own datatype alongside the string's hash/length, deriving the data on first internment. The main implementations of global helpers like string_cache_iter are also moved into the trait itself, so you can just say Dataless::string_cache_iter() or Dataless::num_entries(), and the old global functions redirect to those.

Because there's now a pervasive type parameter required / to maintain compatibility with existing code, I switched it up so Ustr is merely a typedef for a new internal InternedString<N> type using a dataless (()-storing) namespace. The idea is that anyone who wants more namespaces/data would define their own facade, like:

static FOO_NS: LazyLock<Bins<FooNs>> = LazyLock::new(|| Bins::new());
struct FooNs;
impl StringCacheNs for FooNs {
    type Data = char;
    fn derive_cache_data(string: &str) -> Self::Data {
        string.chars().last().unwrap()
    }
    fn cache() -> &'static Bins<Self> {
        &FOO_NS
    }
}
pub type MyStr = InternedString<FooNs>;
pub fn mystr(s: &str) -> MyStr { MyStr::from(s) }

I'll leave this as a draft PR for the moment as I look for other things to clean up / use it myself / eventually run through the benchmarks to see if there's any impact from the trait indirection. Let me know if there's anything you'd want changed or if this just doesn't seem like a good fit to merge in.

JohnathanFL avatar Aug 22 '25 19:08 JohnathanFL