cakeml icon indicating copy to clipboard operation
cakeml copied to clipboard

String literals codegen

Open sorear opened this issue 5 years ago • 0 comments

Right now it seems like we're allocating heap objects and generating very long strings of byte moves:

  4fa534:       be 2e 00 00 00          mov    $0x2e,%esi
  4fa539:       40 88 30                mov    %sil,(%rax)
  4fa53c:       4c 89 f0                mov    %r14,%rax
  4fa53f:       48 89 fe                mov    %rdi,%rsi
  4fa542:       48 c1 ee 09             shr    $0x9,%rsi
  4fa546:       48 01 f0                add    %rsi,%rax
  4fa549:       48 83 c0 0e             add    $0xe,%rax
  4fa54d:       be 66 00 00 00          mov    $0x66,%esi
  4fa552:       40 88 30                mov    %sil,(%rax)
  4fa555:       4c 89 f0                mov    %r14,%rax
  4fa558:       48 89 fe                mov    %rdi,%rsi
  4fa55b:       48 c1 ee 09             shr    $0x9,%rsi
  4fa55f:       48 01 f0                add    %rsi,%rax
  4fa562:       48 83 c0 0f             add    $0xf,%rax
  4fa566:       be 69 00 00 00          mov    $0x69,%esi
  4fa56b:       40 88 30                mov    %sil,(%rax)
  4fa56e:       4c 89 f0                mov    %r14,%rax
  4fa571:       48 89 fe                mov    %rdi,%rsi
  4fa574:       48 c1 ee 09             shr    $0x9,%rsi
  4fa578:       48 01 f0                add    %rsi,%rax
  4fa57b:       48 83 c0 10             add    $0x10,%rax
  4fa57f:       be 6c 00 00 00          mov    $0x6c,%esi
  4fa584:       40 88 30                mov    %sil,(%rax)
  4fa587:       4c 89 f0                mov    %r14,%rax
  4fa58a:       48 89 fe                mov    %rdi,%rsi
  4fa58d:       48 c1 ee 09             shr    $0x9,%rsi
  4fa591:       48 01 f0                add    %rsi,%rax
  4fa594:       48 83 c0 11             add    $0x11,%rax
  4fa598:       be 65 00 00 00          mov    $0x65,%esi
  4fa59d:       40 88 30                mov    %sil,(%rax)
  4fa5a0:       4c 89 f0                mov    %r14,%rax
  4fa5a3:       48 89 fe                mov    %rdi,%rsi
  4fa5a6:       48 c1 ee 09             shr    $0x9,%rsi
  4fa5aa:       48 01 f0                add    %rsi,%rax
  4fa5ad:       48 83 c0 12             add    $0x12,%rax
  4fa5b1:       be 20 00 00 00          mov    $0x20,%esi
  4fa5b6:       40 88 30                mov    %sil,(%rax)
  4fa5b9:       4c 89 f0                mov    %r14,%rax
  4fa5bc:       48 89 fe                mov    %rdi,%rsi
  4fa5bf:       48 c1 ee 09             shr    $0x9,%rsi
  4fa5c3:       48 01 f0                add    %rsi,%rax
  4fa5c6:       48 83 c0 13             add    $0x13,%rax
  4fa5ca:       be 20 00 00 00          mov    $0x20,%esi
  4fa5cf:       40 88 30                mov    %sil,(%rax)
  4fa5d2:       4c 89 f0                mov    %r14,%rax
  4fa5d5:       48 89 fe                mov    %rdi,%rsi
  4fa5d8:       48 c1 ee 09             shr    $0x9,%rsi
  4fa5dc:       48 01 f0                add    %rsi,%rax
  4fa5df:       48 83 c0 14             add    $0x14,%rax
  4fa5e3:       be 20 00 00 00          mov    $0x20,%esi
  4fa5e8:       40 88 30                mov    %sil,(%rax)
  4fa5eb:       4c 89 f0                mov    %r14,%rax
  4fa5ee:       48 89 fe                mov    %rdi,%rsi
  4fa5f1:       48 c1 ee 09             shr    $0x9,%rsi
  4fa5f5:       48 01 f0                add    %rsi,%rax
  4fa5f8:       48 83 c0 15             add    $0x15,%rax
  4fa5fc:       be 20 00 00 00          mov    $0x20,%esi
  4fa601:       40 88 30                mov    %sil,(%rax)
  4fa604:       4c 89 f0                mov    %r14,%rax
  4fa607:       48 89 fe                mov    %rdi,%rsi
  4fa60a:       48 c1 ee 09             shr    $0x9,%rsi
  4fa60e:       48 01 f0                add    %rsi,%rax
  4fa611:       48 83 c0 16             add    $0x16,%rax
  4fa615:       be 20 00 00 00          mov    $0x20,%esi
  4fa61a:       40 88 30                mov    %sil,(%rax)
  4fa61d:       4c 89 f0                mov    %r14,%rax
  4fa620:       48 89 fe                mov    %rdi,%rsi
  4fa623:       48 c1 ee 09             shr    $0x9,%rsi
  4fa627:       48 01 f0                add    %rsi,%rax
  4fa62a:       48 83 c0 17             add    $0x17,%rax

CSE and instruction selection would clean this up dramatically, but it would be better still to generate the string as read-only data and perform a block copy.

We should also make sure that literals are being hoisted to the top level; it doesn't make sense to construct new string literals in a loop.

There is not currently any mechanism for this kind of general rodata. One would have to be added, most likely by generalizing the bitmap table.

Stretch goal: If we can put string literals in consecutive global slots, we can put the lengths in rodata and use a single stub call to copy all of them to the heap.

Alternate stretch goal: If a program has a bunch of string literals that are rarely used, making them lazy would save startup time. This is probably not relevant often.

Alternate stretch goal: at the cost of significant changes to the data representation invariants, string literals could live outside the heap entirely.

Preliminary estimate: the bootstrapped compiler has 5722 patterns which look like string initializations (2 or more byte writes separated by a constant 7 instructions), setting 31624 bytes using 758976 bytes of code. The new 2-arg _StrLit sits in exactly the space of the old _RefByte; the first or third stretch goal would allow most of the calls to be eliminated entirely.

sorear avatar Sep 17 '20 14:09 sorear