Associative array `update` and `require` should allow elaborate key creation
When an associative array has reference type keys, depending on circumstances, keys must be copied into the AA. The typical example is a T[string] where lookup can be performed with a const(char)[] but an added key must be an immutable(char)[] (aka. string).
Illustrative example inspired by dlang wc:
foreach (word; splitter(line))
{
if (auto count = word in dictionary)
*count += 1;
else
dictionary[word.idup] = 1;
}
The reason update cannot be used is because the key word used for lookup in the AA dictionary is invalid to add to the AA. It’s a char[] and will be overwritten in the next loop iteration, thus requiring double lookup, which update has been introduced to avoid.
Suggestion: Add an overload to update that has a key transformer parameter:
void update(K, V, X : K, T, C, U)(ref V[K] aa, X key, scope T transformer, scope C create, scope U update)
if (is(typeof(create()) : V)
&& is(typeof(transformer(key) == K))
&& (is(typeof(update(aa[K.init])) : V) || is(typeof(update(aa[K.init])) == void))
);
The == in the condition is intentional. The transformed key must have type K exactly.
The new code:
foreach (word; splitter(line))
{
dictionary.update(
word,
idup, // or word => word.idup
() => 1,
(ref int count) { count += 1; },
);
}
If key is not present, transform(key) must return an object that is equal to key. In debug mode, this should be asserted.
Is there some advantage over using
foreach (word; splitter(line))
{
dictionary.update(
word.idup,
() => 1,
(ref int count) { count += 1; },
);
}
like we already can? Though I'd personally rewrite this particular example like
foreach (word; splitter(line))
{
dictionary.require(word.idup, 0)++;
}
@Herringway
Is there some advantage over using [
dictionary.update(word.idup,…)]?
Yes, because word.idup allocates a new string, but if the key is already present, that doesn’t need to happen. Users shouldn’t have to pay for what they don’t use.
The C++ STL has the same issue in its std::unordered_map. With C++20, if you provide a “transparent” equality and hash, you can use any compatible object for lookup, e.g. if you have a std::unordered_map<std::string, T, std::equal_to<>, my_string_hash>, you can use a std::string_view for lookup and it will only be converted into a std::string when a key is added.
D’s AAs are basically 5 years behind C++’s unordered_map.
foreach (word; splitter(line)) { dictionary.require(word.idup, 0)++; }
I totally missed that require needs it, too:
dictionary.require(word, idup, 0)++;
With the semantics:
ref V require(K, V, X, T)(ref V[K] aa, X key, T transform, lazy V value)
{
if (V* result = key in aa) return *result;
K k = transform(key);
assert(k == key);
return aa[k] = value();
}
Except that there’s only one lookup.
For completeness, the semantics for update:
void update(K, V, X : K, T, C, U)(ref V[K] aa, X key, scope T transform, scope C create, scope U update)
if (is(typeof(create()) : V)
&& is(typeof(transform(key)) == K)
&& (is(typeof(update(aa[K.init])) : V) || is(typeof(update(aa[K.init])) == void))
)
{
if (V* result = key in aa)
{
static if (is(typeof(update(aa[K.init])) == void))
update(*result);
else
*result = update(*result);
return;
}
K k = transform(key);
assert(k == key);
aa[k] = create();
}
Except that there’s only one lookup.
There is another deficiency with update - we can't read an existing value when the key exists, e.g. to increment it (whilst also handling the case of a missing key, without a double lookup). Perhaps an API like this could solve both problems, though it probably can't be @safe:
auto l = aa.lookup(key);
if (l) // if key exists
l.value++;
else
{
l.key = key.idup;
l.value = someValue;
}
we can't read an existing value when the key exists
Sorry, that's wrong and the object docs are wrong too, I'll make a fix.
Update: It was fixed in August here: https://github.com/dlang/dmd/commit/05ed167ae68#diff-997a4ab4cefafe10e1cf54cbeb98a31adb885efad94c0889b819cb54ecd375f2L3633-R3549.