maxas
maxas copied to clipboard
"It is illegal to set a Read-After-Write dependency on a memory store op", on unmodified sass
using microbench, with .cu modified to be simply:
extern "C" __global__ void microbench(int *out, int *clocks, int *in)
{
out[0] = 7.0f;
out[2] = 5.0f;
clocks[1] = 9.0f;
}
.cpp is modified slightly, since I'm on a 5.0, so I hacked microbench.cpp to accept this. It could be this is root cause for the issue in this issue?
if (major >= 5 && minor >= 0)
{
Then I do:
set -e
nvcc -l cuda -o microbench microbench.cpp
nvcc -arch sm_50 -cubin microbench.cu
maxas.pl -e microbench.cubin > microbench.sass
maxas.pl -i microbench.sass microbench.cubin
./microbench
... however it fails on the maxas.pl -i
line with:
It is illegal to set a Read-After-Write dependency on a memory store op (store ops don't write to a register)
STG.E [R4], R6;
The .sass generated by maxas.pl -i
looks like:
--:-:-:-:6 MOV R1, c[0x0][0x20];
--:-:-:-:1 MOV R0, param_1[0];
--:-:-:-:1 MOV R2, param_0[0];
--:-:-:-:4 MOV R3, param_0[1];
--:-:-:-:1 IADD32I R4.CC, R0, 0x4;
--:-:-:-:2 MOV32I R0, 0x7;
20:3:1:Y:7 IADD.X R5, RZ, param_1[1];
--:1:1:Y:4 STG.E [R4], R6;
--:-:-:Y:0 NOP;
--:-:-:Y:0 NOP;
--:-:-:Y:0 NOP;
Is this:
- because I'm using 5.0 system, and maxas only supports 5.2?
- a bug in
maxas.pl -e
(since I cant see the9.0f
constant anywhere in the sass?, I'm immediately suspicious?) - because I'm using not the latest versoin of maxas, which is eg in
neon
repo? - something I'm doing wrong (other than the cc5.0 thing) ?
- something else?
Update: the sass generated by cuobjdump --dump-sass
does include the 0x9
constant, looks fairly different:
code for sm_50
Function : microbench
.headerflags @"EF_CUDA_SM50 EF_CUDA_PTX_SM(EF_CUDA_SM50)"
/* 0x001fc400fe2007f6 */
/*0008*/ MOV R1, c[0x0][0x20]; /* 0x4c98078000870001 */
/*0010*/ MOV R0, c[0x0][0x148]; /* 0x4c98078005270000 */
/*0018*/ MOV R2, c[0x0][0x140]; /* 0x4c98078005070002 */
/* 0x001fc800fe2007f4 */
/*0028*/ MOV R3, c[0x0][0x144]; /* 0x4c98078005170003 */
/*0030*/ IADD32I R4.CC, R0, 0x4; /* 0x1c10000000470004 */
/*0038*/ MOV32I R0, 0x7; /* 0x010000000077f000 */
/* 0x001fc0001e4007f0 */
/*0048*/ { MOV32I R7, 0x5; /* 0x010000000057f007 */
/*0050*/ STG.E [R2], R0; } /* 0xeedc200000070200 */
/*0058*/ { MOV32I R6, 0x9; /* 0x010000000097f006 */
/*0068*/ STG.E [R2+0x8], R7; } /* 0x0007c400fc4000f1 */
/* 0xeedc200000870207 */
/*0070*/ IADD.X R5, RZ, c[0x0][0x14c]; /* 0x4c1008000537ff05 */
/*0078*/ STG.E [R4], R6; /* 0xeedc200000070406 */
/* 0x001f8000ffe007ff */
/*0088*/ EXIT; /* 0xe30000000007000f */
/*0090*/ BRA 0x90; /* 0xe2400fffff87000f */
/*0098*/ NOP; /* 0x50b0000000070f00 */
/* 0x001f8000fc0007e0 */
/*00a8*/ NOP; /* 0x50b0000000070f00 */
/*00b0*/ NOP; /* 0x50b0000000070f00 */
/*00b8*/ NOP; /* 0x50b0000000070f00 */
...........................
I'm tentatively convinced it's something to do with the 5.0 thing?
Oh... I guess that the 5.0 code is interpreted as control codes, and being absorbed into 20:3:1:Y:7
, rather than becoming:
/*0048*/ { MOV32I R7, 0x5; /* 0x010000000057f007 */
/*0050*/ STG.E [R2], R0; } /* 0xeedc200000070200 */
/*0058*/ { MOV32I R6, 0x9;
Fair analysis? Surmountable? Only option is either to rewrite maxas to support 5.0 or get a 5.2 card?
(by the way, in case useful, I get the following warnings when running maxas:
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/ { <-- HERE (?<db5>5)?,?(?<db4>4)?,?(?<db3>3)?,?(?<db2>2)?,?(?<db1>1)?,?(?<db0>0)?}/ at /usr/local/share/perl/5.22.1/MaxAs/MaxAsGrammar.pm line 239.
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/^(?^:\@(?<predNot>\!)?P(?<predNum>[0-6]) )?DEPBAR(?^: { <-- HERE (?<db5>5)?,?(?<db4>4)?,?(?<db3>3)?,?(?<db2>2)?,?(?<db1>1)?,?(?<db0>0)?});/ at /usr/local/share/perl/5.22.1/MaxAs/MaxAsGrammar.pm line 275.
Maybe it is a perl versioning issue? I'm using:
$ perl --version
This is perl 5, version 22, subversion 1 (v5.22.1) built for x86_64-linux-gnu-thread-multi
(with 58 registered patches, see perl -V for more detail)
Seems maybe an issue with the regex on line 1521 of MaxAsGrammar.pm?
Hack this file to put some prints:
print "psl check match\n";
if ($line =~ m"^\s+/\*(?<num>[0-9a-f]+)\*/\s+$InstRe\s+/\* (?<code>0x[0-9a-f]+)"o)
{
print " ... matched!\n";
Output:
psl check match
... matched!
ctrl inst:
$VAR1 = {
'code' => '5519169589765144579',
'ins' => 'MOV R3, c[0x0][0x144];',
'inst' => 'MOV R3, c[0x0][0x144];',
'num' => 40,
'op' => 'MOV',
'pred' => undef
};
ctrl 2033
ctrl line /*0030*/ IADD32I R4.CC, R0, 0x4; /* 0x1c10000000470004 */
line before processSassLine /*0030*/ IADD32I R4.CC, R0, 0x4; /* 0x1c10000000470004 */
psl check match
... matched!
ctrl inst:
$VAR1 = {
'code' => '2022116232694005764',
'ins' => 'IADD32I R4.CC, R0, 0x4;',
'inst' => 'IADD32I R4.CC, R0, 0x4;',
'num' => 48,
'op' => 'IADD32I',
'pred' => undef
};
ctrl 2034
ctrl line /*0038*/ MOV32I R0, 0x7; /* 0x010000000077f000 */
line before processSassLine /*0038*/ MOV32I R0, 0x7; /* 0x010000000077f000 */
psl check match
... matched!
ctrl inst:
$VAR1 = {
'code' => '72057594045788160',
'ins' => 'MOV32I R0, 0x7;',
'inst' => 'MOV32I R0, 0x7;',
'num' => 56,
'op' => 'MOV32I',
'pred' => undef
};
# line /* 0x001fc0001e4007f0 */
ctrl 2032
ctrl line /*0048*/ { MOV32I R7, 0x5; /* 0x010000000057f007 */
line before processSassLine /*0048*/ { MOV32I R7, 0x5; /* 0x010000000057f007 */
psl check match
no match
ctrl 242
ctrl line /*0050*/ STG.E [R2], R0; } /* 0xeedc200000070200 */
line before processSassLine /*0050*/ STG.E [R2], R0; } /* 0xeedc200000070200 */
psl check match
no match
ctrl 2032
ctrl line /*0058*/ { MOV32I R6, 0x9; /* 0x010000000097f006 */
Repeated displays of no match
suggest the regex is failing for some reason?
Oh... I reckon it doesnt like the {
and }
signs in the cuobjdump
output:
/*0048*/ { MOV32I R7, 0x5; /* 0x010000000057f007 */
/*0050*/ STG.E [R2], R0; } /* 0xeedc200000070200 */
/*0058*/ { MOV32I R6, 0x9; /* 0x010000000097f006 */
/*0068*/ STG.E [R2+0x8], R7; } /* 0x0007c400fc4000f1 */
Adding
$line =~ s/{//g;
$line =~ s/}//g;
to MaxAs.pl, just before my $inst = processSassLine($line) or next CTRL;
makes the generated .sass now contain the missing instructions:
--:-:-:-:6 MOV R1, c[0x0][0x20];
--:-:-:-:1 MOV R0, param_1[0];
--:-:-:-:1 MOV R2, param_0[0];
--:-:-:-:4 MOV R3, param_0[1];
--:-:-:-:1 IADD32I R4.CC, R0, 0x4;
--:-:-:-:2 MOV32I R0, 0x7;
--:-:-:-:0 MOV32I R7, 0x5;
--:1:-:-:2 STG.E [R2], R0;
--:-:-:-:0 MOV32I R6, 0x9;
20:3:1:Y:7 IADD.X R5, RZ, param_1[1];
--:1:1:Y:4 STG.E [R4], R6;
--:-:-:Y:0 NOP;
--:-:-:Y:0 NOP;
--:-:-:Y:0 NOP;
... but same error about STG.E as before:
It is illegal to set a Read-After-Write dependency on a memory store op (store ops don't write to a register)
STG.E [R4], R6;
I'll submit a PR for the braces though
Mmmmm, this is interesting. There should be one control code followed by three operations right? But for the short example, with the parentheses in hte output, this rule seems to be broken temporarily:
cuobjdump --dump-sass
output:
/* 0x001fc0001e4007f0 */
/*0048*/ { MOV32I R7, 0x5; /* 0x010000000057f007 */
/*0050*/ STG.E [R2], R0; } /* 0xeedc200000070200 */
/*0058*/ { MOV32I R6, 0x9; /* 0x010000000097f006 */
/*0068*/ STG.E [R2+0x8], R7; } /* 0x0007c400fc4000f1 */
/* 0xeedc200000870207 */
/*0070*/ IADD.X R5, RZ, c[0x0][0x14c]; /* 0x4c1008000537ff05 */
/*0078*/ STG.E [R4], R6; /* 0xeedc200000070406 */
/* 0x001f8000ffe007ff */
/*0088*/ EXIT; /* 0xe30000000007000f */
/*0090*/ BRA 0x90; /* 0xe2400fffff87000f */
/*0098*/ NOP; /* 0x50b0000000070f00 */
4 contiguous non-control lines, hten a control, then only 2 control-lines. Maybe the parenthesis sort of moves around the contorl codes, so that the control code after the block of 4 actually applies retroactively to the previous line???
Hmmm... except that... 0xeedc200000870207 looks not like a control code:
$ python ../printcode.py 0xeedc200000870207
stall 7 thisyield 0 write 0 read 2 watdb 32
32:2:0:0:7
stall 4 thisyield 0 write 0 read 0 watdb 0
0:0:0:0:4
stall 8 thisyield 0 write 0 read 7 watdb 54
54:7:0:0:8
... but if we use the code from the previous line instead, ie 0x0007c400fc4000f1
, that looks remarkably control-code like, and stays consistent with the 1:3 rule:
$ python ../printcode.py 0x0007c400fc4000f1
stall 1 thisyield 1 write 7 read 0 watdb 0
0:0:7:1:1
stall 2 thisyield 0 write 7 read 7 watdb 0
0:7:7:0:2
stall 1 thisyield 1 write 7 read 1 watdb 0
0:1:7:1:1
Hypothesis: in the presence of parantheses one operation and a control code can be swapped in the human-readable assembler output, in the center column, but continue in the same 1:3 order as normal, in the code output in the right-hand column.
To fix this, I was planning on using the output from nvdisasm -raw and pulling the control codes directly from the binary cubin. But in the meantime I'm just using the nvdisasm from cuda 6.5 (cuobjdump calls nvdisasm internally). You can still use cuda 7.5 you just need to overwrite the new version of that file with the old.
Fixed :-)
ubuntu@peach:~/git/maxas/microbench$ ./run-sass.sh
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/ { <-- HERE (?<db5>5)?,?(?<db4>4)?,?(?<db3>3)?,?(?<db2>2)?,?(?<db1>1)?,?(?<db0>0)?}/ at /usr/local/share/perl/5.22.1/MaxAs/MaxAsGrammar.pm line 239.
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/^(?^:\@(?<predNot>\!)?P(?<predNum>[0-6]) )?DEPBAR(?^: { <-- HERE (?<db5>5)?,?(?<db4>4)?,?(?<db3>3)?,?(?<db2>2)?,?(?<db1>1)?,?(?<db0>0)?});/ at /usr/local/share/perl/5.22.1/MaxAs/MaxAsGrammar.pm line 275.
Kernel: microbench, Instructions: 0, Register Count: 8, Bank Conflicts: 0, Reuse: 0.0% (0/1)
Using: Id:0 GeForce 940M (5.0)
b:00 w:000 t:0000 l:00 clocks:00000000 out:00000007
b:00 w:001 t:0032 l:00 clocks:00000000 out:00000000
b:00 w:002 t:0064 l:00 clocks:00000000 out:00000000
b:00 w:003 t:0096 l:00 clocks:00000000 out:00000000
average: 0.000, min 0, max: 0
(Oh, hmmm, might be a slight buggette in the loop definition )
(should be ok now though perhaps?)