dmd icon indicating copy to clipboard operation
dmd copied to clipboard

Associative array `update` and `require` should allow elaborate key creation

Open Bolpat opened this issue 2 weeks ago • 4 comments

When an associative array has reference type keys, depending on circumstances, keys must be copied into the AA. The typical example is a T[string] where lookup can be performed with a const(char)[] but an added key must be an immutable(char)[] (aka. string).

Illustrative example inspired by dlang wc:

foreach (word; splitter(line))
{
    if (auto count = word in dictionary)
        *count += 1;
    else
        dictionary[word.idup] = 1;
}

The reason update cannot be used is because the key word used for lookup in the AA dictionary is invalid to add to the AA. It’s a char[] and will be overwritten in the next loop iteration, thus requiring double lookup, which update has been introduced to avoid.

Suggestion: Add an overload to update that has a key transformer parameter:

void update(K, V, X : K, T, C, U)(ref V[K] aa, X key, scope T transformer, scope C create, scope U update)
if (is(typeof(create()) : V)
&&  is(typeof(transformer(key) == K))
&&  (is(typeof(update(aa[K.init])) : V) || is(typeof(update(aa[K.init])) == void))
);

The == in the condition is intentional. The transformed key must have type K exactly. The new code:

foreach (word; splitter(line))
{
    dictionary.update(
        word,
        idup, // or word => word.idup
        () => 1,
        (ref int count) { count += 1; },
    );
}

If key is not present, transform(key) must return an object that is equal to key. In debug mode, this should be asserted.

Bolpat avatar Dec 04 '25 14:12 Bolpat

Is there some advantage over using

foreach (word; splitter(line))
{
    dictionary.update(
        word.idup,
        () => 1,
        (ref int count) { count += 1; },
    );
}

like we already can? Though I'd personally rewrite this particular example like

foreach (word; splitter(line))
{
    dictionary.require(word.idup, 0)++;
}

Herringway avatar Dec 04 '25 16:12 Herringway

@Herringway

Is there some advantage over using [dictionary.update(word.idup,…)]?

Yes, because word.idup allocates a new string, but if the key is already present, that doesn’t need to happen. Users shouldn’t have to pay for what they don’t use.

The C++ STL has the same issue in its std::unordered_map. With C++20, if you provide a “transparent” equality and hash, you can use any compatible object for lookup, e.g. if you have a std::unordered_map<std::string, T, std::equal_to<>, my_string_hash>, you can use a std::string_view for lookup and it will only be converted into a std::string when a key is added.

D’s AAs are basically 5 years behind C++’s unordered_map.

foreach (word; splitter(line))
{
    dictionary.require(word.idup, 0)++;
}

I totally missed that require needs it, too:

dictionary.require(word, idup, 0)++;

With the semantics:

ref V require(K, V, X, T)(ref V[K] aa, X key, T transform, lazy V value)
{
    if (V* result = key in aa) return *result;
    K k = transform(key);
    assert(k == key);
    return aa[k] = value();
}

Except that there’s only one lookup.

For completeness, the semantics for update:

void update(K, V, X : K, T, C, U)(ref V[K] aa, X key, scope T transform, scope C create, scope U update)
if (is(typeof(create()) : V)
&&  is(typeof(transform(key)) == K)
&& (is(typeof(update(aa[K.init])) : V) || is(typeof(update(aa[K.init])) == void))
)
{
    if (V* result = key in aa)
    {
        static if (is(typeof(update(aa[K.init])) == void))
            update(*result);
        else
            *result = update(*result);
        return;
    }
    K k = transform(key);
    assert(k == key);
    aa[k] = create();
}

Except that there’s only one lookup.

Bolpat avatar Dec 05 '25 12:12 Bolpat

There is another deficiency with update - we can't read an existing value when the key exists, e.g. to increment it (whilst also handling the case of a missing key, without a double lookup). Perhaps an API like this could solve both problems, though it probably can't be @safe:

auto l = aa.lookup(key);
if (l) // if key exists
    l.value++;
else
{
    l.key = key.idup;
    l.value = someValue;
}

ntrel avatar Dec 08 '25 10:12 ntrel

we can't read an existing value when the key exists

Sorry, that's wrong and the object docs are wrong too, I'll make a fix.

Update: It was fixed in August here: https://github.com/dlang/dmd/commit/05ed167ae68#diff-997a4ab4cefafe10e1cf54cbeb98a31adb885efad94c0889b819cb54ecd375f2L3633-R3549.

ntrel avatar Dec 08 '25 12:12 ntrel