Draft: replace AA runtime hooks with templated implementation
Adds the complete templated implementation to newaa.d, keeping binary compatibility with rt/aaa.d for now.
This should help inferring proper function attributes on the operations, aswell as using regular constructors/copy operation on key and value types.
Four of the six compiler hooks are replaced, but there are two more calls from TypeInfo_AssociativeArray's getHash() and equal() to forward.
Most of the time was spent fighting inout in template arguments :/
Thanks for your pull request, @rainers!
Bugzilla references
Your PR doesn't reference any Bugzilla issue.
If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.
⚠️⚠️⚠️ Warnings ⚠️⚠️⚠️
- In preparation for migrating from Bugzilla to GitHub Issues, the issue reference syntax has changed. Please add the word "Bugzilla" to issue references. For example,
Fix Bugzilla Issue 12345orFix Bugzilla 12345.(Reminder: the edit needs to be done in the Git commit message, not the GitHub pull request.)
Testing this PR locally
If you don't have a local development environment setup, you can use Digger to test this PR:
dub run digger -- build "master + dmd#21066"
Thank you for working on this.
This now passes all the tests with all associative array operations templated and the runtime implementation removed. It is probably a bit large for review, though. I have rebased and split off a couple of preliminary commits, some cherry picking fixes from stable. The main changes in the last couple of commits are still large. Some notes:
-
implemented hooks as templates:
- _d_assocarrayliteralTX
- _d_aaNew
- _d_aaEqual
- _d_aaIn
- _d_aaDel
- _aaLen
- _aaGetY
- _aaGetRvalueX
- _aaApply/_aaApply2
- TypeInfo_AssociativeArray.equals
- TypeInfo_AssociativeArray.getHash
I kept most names as before, but I think it would be better to use the
_d_prefix more consistently. -
other template functions called from object.d:
- _aaClear
- _aaRehash
- _aaDup
- _aaValues
- _aaKeys
- _aaRange, _aaRangeEmpty, _aaRangeFrontKey, _aaRangeFrontValue, _aaRangePopFront
-
_d_assocarrayliteralTXTrace removed. No other AA operation is hooked for -profile-gc, and that function is pretty unreliable anyway.
-
indexing an associative array now lowered to _aaGetRvalueX/_aaGetY+ContructExp/opAssign in the semantic phase:
- implementation details for the respective AST node IndexExp (I guess some of this should be added as documentation to the lowering code):
- in an assignment/pre-in/dec operations, (multi-dimensional) ArrayExps on left-hand-side are marked
modifiablebefore semantic analysis recurses down - while ArrayExps are analyzed and converted to IndexExp, transfer
modifiable - IndexExp with
modifiable == falseare converted to call to _aaGetRvalueX + optional bounds check - IndexExp with
modifiable == trueare rewritten after both sides of the assignments are analyzed with the same (sometimes surprising) semantics as before - IndexExp is kept, but is indexing pointer to value, e.g. aa[key] -> _aaGetRvalueX(aa, key)[0]
- IndexExp.loweredFrom keeps reference to original expression, used for expression printing
- multi-dimensional modifying access is handled immediately, no need to rewrite later
- in an assignment/pre-in/dec operations, (multi-dimensional) ArrayExps on left-hand-side are marked
- no need to reimplement the semantics in CTFE again, code removed in interpreter
- implementation details for the respective AST node IndexExp (I guess some of this should be added as documentation to the lowering code):
-
AA templates are now in newaa that so far only contained the AA literal generation for static vars. newaa is not a good name in the long run, though, should just be aa? assocarray?
-
slicing array now also ok in todt.d (used to support only strings)
-
aa.remove(key) now works with alias this
-
RemoveExp could be removed completely
-
aa[key] = S with S.ctor and S.opAssign no longer a double lookup
-
creating an AA literal at runtime passes keys and values as arguments normally, not with special move/blit operations, so more postblit/dtor operations can happen
-
AA no longer available at CTFE without object.d/newaa.d. There was some support for this, but no more. Is this a problem?
-
Some lowerings are no longer shown in error messages, but expression precedence information might be lost lost if operator translated to call.
AA no longer available at CTFE without object.d/newaa.d. There was some support for this, but no more. Is this a problem?
I don't think so, using D without druntime is not a stable setup.
I admit I'll be a little sad to see this capability go, but I'd much rather see the root problem (druntime not usable on bare metal) addressed than pile on more hacks to support this.
It is, but I think it's worth pulling in one go, unless you want to put in the work to split this up into incremental changes. A lot of the diff comes from replacing rt/aaA.d with full core/internal/newaa.d.
Indeed. Splitting them up might be feasible, but getting every step to compile and test successfully might be quite some work. I would recommend merging stable to master before a merge, as a couple of fixes are similar, but not exactly the same commits, so I'll have to resolve conflicts on them. That will also slightly reduce the changes here.
in an assignment/pre-in/dec operations, (multi-dimensional) ArrayExps on left-hand-side are marked modifiable before semantic analysis recurses down
Do I understand correctly that this is to support doing:
a["b"]["c"] = d;Creating "b" and "c" without range errors on
a["b"]?
Yes. The previous logic was much more involved with markSettingAAElem, reorderSettingAAElem and more. I hope I haven't missed any detail in there.
I love how much code this moves from compiler internals to druntime!
Actually, the amount of code in druntime hasn't changed that much, it's more about moving code from the glue layer to semantic analysis. More code can be removed later, e.g. members in the AA that are only for binary compatibility with the old implementation or semanticTypeInfo-calls now obsolete as newaa does not make use of TypeInfo.
AA no longer available at CTFE without object.d/newaa.d. There was some support for this, but no more. Is this a problem?
I don't think so, using D without druntime is not a stable setup.
I admit I'll be a little sad to see this capability go, but I'd much rather see the root problem (druntime not usable on bare metal) addressed than pile on more hacks to support this.
Basic operations are currently supported in CTFE, but others need object.d-support, e.g. aa.keys, aa.values, require, update, etc. Using the new implementation during CTFE should be fine on bare metal, too, but needs the proper declarations (the interpreter intercepts the calls, so almost no code in newaa is actually executed).
I looked at all issues labeled AA, these are the ones that should be fixed by this PR:
#17455 Reimplement associative arrays on runtime #17412 Associative array opApply is not nothrow #18945 .dup of associative array is not mutable #17275 Possibly wrong code in object_.d #18736 [AA] Cannot dup const AA #18706 Inefficient AA value setting #17235 [AA] Associative array with enum array keys is slow
One issue that might be considered a regression is that mutable keys are now ok to use in some expressions, because the key type is currently always modified to be const (not immutable), and if used as a function argument, almost anything converts to it implicitly. There are heated discussion in multiple reports, e.g. #17648 or #18489, but not really a consensus. Walters attempt to force keys to be immutable failed here: https://github.com/dlang/dmd/pull/4835
#17235 [AA] Associative array with enum array keys is slow
When trying the benchmark of #17235 I noticed one possible downside: with non-optimized builds: using AAs extensively will result in notably slower execution than with the current optimized prebuilt implementation in the runtime. An optimized build is a bit faster than before, though.
notably slower
Roughly how much? Debug build performance is relatively unimportant, but if lookups take more than twice as long then it's worth looking into putting hot spots back into pre-compiled druntime.
Roughly how much?
When I compare the test from the report (on immutable(char), not the slow enum versiob) with dmd-2.111, it takes indeed almost twice as much time:
dmd-2.111: 1.460 seconds this PR: 2.720 seconds
dmd -2.111 -O -inline -release: 1.460 seconds this PR -O -inline -release: 1.430 seconds
When I run the debug build in a profiler, more than 50% of the time is spent calculating hashes, 2/3 of that in bytesHash(bool dataKnownToBeAligned)(scope const(ubyte)[] bytes, size_t seed), which should not be too difficult to precompile into the runtime.
I have increased the loop count in https://github.com/dlang/dmd/issues/17235 by a factor 10 to improve profiling and added some pragma(inline, true) after applying https://github.com/dlang/dmd/pull/21524. This reduces the gap for debug builds considerably:
| options | dmd-2.111.0 | ldc-1.41.0 | dmd-PR |
|---|---|---|---|
| -O -inline -release | 14.320 s | 8.100 s | 12.110 s |
| 14.550 s | 9.500 s | 18.150 s | |
| -g | 14.240 s | 9.380 s | 22.220 s |
| -g -inline | 13.700 s | 9.180 s | 12.670 s |
- results for dmd-2.111 and ldc don't depend much on options, because most of the code is prebuilt into druntime.
- the difference between building without and with debug info seems to come from generating stack frames unconditionally.
Note that immutable(char)[] is one of the few types that take advantage of optimized TypeInfo prebuilt into druntime. When replacing it with immutable(byte)[] I get:
| options | dmd-2.111.0 | ldc-1.41.0 | dmd-PR |
|---|---|---|---|
| -O -inline -release | 48.140 s | 66.620 s | 13.160 s |
| 47.240 s | 62.720 s | 18.540 s | |
| -g | 47.270 s | 69.190 s | 23.200 s |
| -g -inline | 48.260 s | 76.410 s | 13.480 s |
No idea why ldc performs even worsee than dmd here.
Edit: precompiling bytesHash into druntime improves results further:
| options | dmd-PR |
|---|---|
| -O -inline -release | 12.240 s |
| 17.130 s | |
| -g | 19.840 s |
| -g -inline | 12.770 s |
but expression precedence information might be lost lost if operator translated to call.
This is fixed by expPrecedence that uses the precedence of the added loweredFrom expression of the call.
Optimized slightly further by avoiding more wrapper calls and some bounds-checks:
| options | dmd-2.111.0 | dmd-PR |
|---|---|---|
| -O -inline -release | 14.490 s | 11.050 s |
| 14.500 s | 15.070 s | |
| -g | 14.370 s | 16.990 s |
| -g -inline | 14.160 s | 10.960 s |
Pretty acceptable IMO :)
I take it this is no longer a draft?
I take it this is no longer a draft?
Indeed, I have removed that from the title.
whoa! how come nobody pinged me on this!
Nice work!
@rainers Do you have any TODO's left for this PR that I could help with?
@rainers Do you have any TODO's left for this PR that I could help with?
Thanks for the offer. I think this is more or less done.
After a rebase, it is now broken, though. I looks like that this is caused by the new lowering of array literals. These are used as arguments to AA literal creation: see tryLowerAALiteral() in expressionsem.d (linking to it here doesn't work as it is displayed collapsed in the diff).
I had to add an explicit call to checkGC() to trigger the lowering of these array literals, but this now causes more warnings with -vgc. Any idea how to avoid this?
BTW: I was quite surprised to find the array literal lowering as a side effect in nogc.d. Why isn't it in expressionsem.d?
One issue that might be considered a regression is that mutable keys are now ok to use in some expressions, because the key type is currently always modified to be const (not immutable), and if used as a function argument, almost anything converts to it implicitly.
It seems stock dmd 2.111 accepts mutable keys, too:
struct S
{
char[] str;
}
void main()
{
int[S] aa;
S s;
s.str = "123".dup;
aa[s] = 3;
assert(s in aa);
s.str[0] = 'a';
S s2;
s2.str = "123".dup;
assert(s2 in aa); // fails
}
compiles and breaks the AA, so no regression caused by the templated version.
I had to add an explicit call to checkGC() to trigger the lowering of these array literals, but this now causes more warnings with -vgc. Any idea how to avoid this?
I will take a look.
I was quite surprised to find the array literal lowering as a side effect in nogc.d. Why isn't it in expressionsem.d?
Dennis came up with that, see this comment.
@rainers The warnings originate from here [1] [2], when the arguments are being visited as a result of checkGC on the lowered expression [3].
Regarding the solution, I am not sure which one is the cleanest. One simple fix is to disable vgc and enable debug on the scope:
const vgc = global.params.v.gc;
const dbg_ = sc.debug_;
global.params.v.gc = false;
sc.debug_ = true;
checkGC(loweredExp, sc);
global.params.v.gc = vgc;
sc.debug_ = dbg_;
I don't think this would break anything else, but there may be better solutions.
Perhaps extracting the lowering logic in a separate function, similar to this [1], would be more appropriate. This implementation appears to solve all except one failing test from dmd (which is unrelated to array literals, it is caused by the failure to inline mulu in _d_newarrayT call).
Thanks for the analysis. I think extracting the array lowering and explicitly calling it looks good. I copied that from your commit.
Instead of gagging the output, I think excluding the function call to _d_arrayliteralTX from @nogc as all the other functions is the expected solution (see https://github.com/dlang/dmd/pull/21066/commits/7f6ed49ef845255001c29872baabea9868963fa2#diff-a556a8e6917dd4042f541bdb19673f96940149ec3d416b0156af4d0e4cc5e4bdL2305).
I'll take care of the mulu mess in another PR.
It seems stock dmd 2.111 accepts mutable keys, too:
I noticed that dmd 2.111 checks for mutable dynamic arrays as keys when inserting, I restored that check now and added a test. No other potentially bad types are checked, and you can just use implicit conversion to const to have it accepted.
Is this good to go now?
I just realize that the lowerings might not preserve left-to-right evaluation order of sub-expressions (InExp, AA-literal). I'll add some tests to verify...
It's worse than expected for current dmd: https://github.com/dlang/dmd/issues/21638. This PR meets LTR order for all expressions but InExp and AA-literals, only the former is a regression.
Fixed the InExp evaluation order, but I noticed another change:
class C { int x; }
void testlvalue()
{
int[int] aa = [0:1];
int[] da = new int[1];
C obj = new C;
int[int] retaa()
{
return aa;
}
int[] retda()
{
return da;
}
C retobj()
{
return obj;
}
retda()[0] = 1; // ok
retobj().x = 1; // ok
retaa()[0] = 1; // Error: cannot modify expression `retaa()[0]` because it is not an lvalue
}
This error goes away with this PR. Is it an expected error? Why is an AA not considered a reference type? AFAICT the only difference to dynamic arrays is that an unintentional null value doesn't cause a runtime exception/crash.
This error goes away with this PR. Is it an expected error?
Very unexpected. It should definitely be reference type.
Is it an expected error? Why is an AA not considered a reference type?
if it’s null, this will initialize a new aa and then throw it away. But I don’t know that I’d expect this to be an error. Probably should keep the same behavior.
Probably should keep the same behavior.
Restored the error message for IndexExp. This allowed reverting most of the changes to fail6795.d ( https://github.com/dlang/dmd/pull/21066/commits/641cd7d32b83841b999c979261efbfa2335a61e7#diff-5e1147b82268376a89e4a40fac3bf56a4f5a9057a6b6b02bc12895d797b7c889 ), but for taking the address of aa[idx] when aa is a literal constant. idx in aa is also not forbidden in this case.