popcorn-compiler
popcorn-compiler copied to clipboard
Soft floating-point emulation routines leaking into aarch64
GCC soft float library routines are leaking into musl
, which can cause unhandled level 3 translation fault
s in programs. E.g., NPB's EP:
Feb 20 09:05:11 fox6 kernel: [66009.923530] EP_A[3919]: unhandled level 3 translation fault (11) at 0x00522330, esr 0x83000007
Feb 20 09:05:11 fox6 kernel: [66009.923538] pgd = ffff810f42255000
Feb 20 09:05:11 fox6 kernel: [66009.927041] [00522330] *pgd=0000010f13e68003, *pud=0000010f1c041003, *pmd=0000010f2017a003, *pte=0000000000000000
Feb 20 09:05:11 fox6 kernel: [66009.937485]
Feb 20 09:05:11 fox6 kernel: [66009.937494] CPU: 7 PID: 3919 Comm: EP_A Tainted: G OE 4.4.55+ #17
Feb 20 09:05:11 fox6 kernel: [66009.937497] Hardware name: Penguin Computing Valkre 2000/MT60-SC4, BIOS T48 10/02/2017
Feb 20 09:05:11 fox6 kernel: [66009.937501] task: ffff810f13e51a00 ti: ffff810f1bdac000 task.ti: ffff810f1bdac000
Feb 20 09:05:11 fox6 kernel: [66009.937506] PC is at 0x522330
Feb 20 09:05:11 fox6 kernel: [66009.937509] LR is at 0x508004
Feb 20 09:05:11 fox6 kernel: [66009.937512] pc : [<0000000000522330>] lr : [<0000000000508004>] pstate: a0000000
Feb 20 09:05:11 fox6 kernel: [66009.937515] sp : 00007fffffbfc8c0
Feb 20 09:05:11 fox6 kernel: [66009.937518] x29: 00007fffffbfc8d0 x28: 0000000000000000
Feb 20 09:05:11 fox6 kernel: [66009.937522] x27: 0000000000601061 x26: 0000000000000000
Feb 20 09:05:11 fox6 kernel: [66009.937527] x25: 00007fffffbfea80 x24: 0000000000601066
Feb 20 09:05:11 fox6 kernel: [66009.937532] x23: 000000007fffffff x22: 0000000000000000
Feb 20 09:05:11 fox6 kernel: [66009.937536] x21: 000000000000000f x20: 0000000000000001
Feb 20 09:05:11 fox6 kernel: [66009.937541] x19: 00007fffffbfcb30 x18: 00007fffffbfcaf0
Feb 20 09:05:11 fox6 kernel: [66009.937545] x17: 00007fffffbfcb5f x16: 0000000000604000
Feb 20 09:05:11 fox6 kernel: [66009.937550] x15: 000000000000000f x14: 0000000000601320
Feb 20 09:05:11 fox6 kernel: [66009.937554] x13: 000000000000003a x12: 000000007fffffff
Feb 20 09:05:11 fox6 kernel: [66009.937559] x11: 000000000000000a x10: 00000000ffffff90
Feb 20 09:05:11 fox6 kernel: [66009.937563] x9 : 00007fffffbfec70 x8 : 00007fffffbfebf0
Feb 20 09:05:11 fox6 kernel: [66009.937568] x7 : 0000000000000001 x6 : 000000000060102f
Feb 20 09:05:11 fox6 kernel: [66009.937572] x5 : 000000000060102f x4 : 0000000000000001
Feb 20 09:05:11 fox6 kernel: [66009.937577] x3 : 0000000000012889 x2 : 00007fffffbfe9e0
Feb 20 09:05:11 fox6 kernel: [66009.937581] x1 : 0000000000000019 x0 : 00007fffffbfcb30
Running objdump
shows:
0000000000522330 <__extenddftf2>:
522330: 9e660000 fmov x0, d0
522334: d53b4401 mrs x1, fpcr
522338: d374f801 ubfx
As for where this gets called -- one call-chain (there may be others) is through pop_arg
in musl-1.18/src/stdio/vfprintf.c
at line 128:
108 static void pop_arg(union arg *arg, int type, va_list *ap)
109 {
110 switch (type) {
111 case PTR: arg->p = va_arg(*ap, void *);
112 break; case INT: arg->i = va_arg(*ap, int);
113 break; case UINT: arg->i = va_arg(*ap, unsigned int);
114 break; case LONG: arg->i = va_arg(*ap, long);
115 break; case ULONG: arg->i = va_arg(*ap, unsigned long);
116 break; case ULLONG: arg->i = va_arg(*ap, unsigned long long);
117 break; case SHORT: arg->i = (short)va_arg(*ap, int);
118 break; case USHORT: arg->i = (unsigned short)va_arg(*ap, int);
119 break; case CHAR: arg->i = (signed char)va_arg(*ap, int);
120 break; case UCHAR: arg->i = (unsigned char)va_arg(*ap, int);
121 break; case LLONG: arg->i = va_arg(*ap, long long);
122 break; case SIZET: arg->i = va_arg(*ap, size_t);
123 break; case IMAX: arg->i = va_arg(*ap, intmax_t);
124 break; case UMAX: arg->i = va_arg(*ap, uintmax_t);
125 break; case PDIFF: arg->i = va_arg(*ap, ptrdiff_t);
126 break; case UIPTR: arg->i = (uintptr_t)va_arg(*ap, void *);
127 break; case DBL: arg->f = va_arg(*ap, double);
128 break; case LDBL: arg->f = va_arg(*ap, long double);
129 }
130 }
This line appears to get executed whenever printf
(or related functions) gets called with the %lf
conversion specifier (based on a gdb trace I performed). I tried removing the long modifier (l
), but that didn't seem to change the behavior.
I've been investigating how to address this -- ideally, we could eliminate quad precision (128-bit floating point) types and force long double
s to be the same size as double
s (which is allowed by the C standard). That said, I have no idea how to achieve that as of right now.
This patch should fix the issue (at least temporarly):
diff --git a/tool/alignment/pyalign/Linker.py b/tool/alignment/pyalign/Linker.py
index 710d6d3..64494b9 100644
--- a/tool/alignment/pyalign/Linker.py
+++ b/tool/alignment/pyalign/Linker.py
@@ -90,6 +90,14 @@ class Linker:
for symbolName in Globals.SYMBOLS_BLACKLIST[section]:
output_buffer.append("\t*(" + symbolName + ");\n")
+ # *This solve the problem of compiler inserted function: float function such as __extenddftf2
+ # (these function lands in the default *text section at the end of other function specific fuunctions)
+ # This leads to section with different size on Xeon and ARM(bigger) binaries which in turn leads to
+ # a segfault, since the DSM check the address at origin and when not found, trigger the fault!
+ # *The solution is to make sure that both section (on ARM and Xeon) have the same size.
+ # This alignement sould do just that (at least for many cases!)
+ output_buffer.append("\t. = ALIGN(0x100000);\n")
+
# Section "closing" part
output_buffer.append("}\n")
@mohamed-karaoui That did the trick. Maybe submit that as a patch via pull request? Thanks!!
I know this patch fixes this particular problem for EP, but it's not very robust (plus having gigantic alignment directives all over the place balloons the size of the binary). @olivierpierre is there a way we can accumulate the architecture-specific & blacklisted symbol sizes in order to safely align the next section to the correct page across all architectures?
I'll have to look at this, can we attach the sources for a test case here? In the meantime, I recommend using Mohamed solution, some ideas to make it lighter:
- Potentially the amount of padding can be reduced imo, I doubt arm is adding 1MB of code at the end of the
.text
section - It seems this padding can be added only at the end of selected section (
.text.
, any other?)
@olivierpierre Any program will do, but here's a simple hello world program for fun :)
More accurately, anything compiled with Popcorn's musl
library will do, since that's where the soft float routines originate.
I believe I've found a better (?) fix that eliminates the soft float libraries altogether. Based on a comment on a StackOverflow question I asked and a post on the llvm-dev listserv, I tracked down the following code in <LLVM_SRC>/tools/clang/lib/Basic/Targets.cpp
(lines 4950-4991):
class AArch64TargetInfo : public TargetInfo {
virtual void setDescriptionString() = 0;
static const TargetInfo::GCCRegAlias GCCRegAliases[];
static const char *const GCCRegNames[];
enum FPUModeEnum {
FPUMode,
NeonMode
};
unsigned FPU;
unsigned CRC;
unsigned Crypto;
static const Builtin::Info BuiltinInfo[];
std::string ABI;
public:
AArch64TargetInfo(const llvm::Triple &Triple)
: TargetInfo(Triple), ABI("aapcs") {
if (getTriple().getOS() == llvm::Triple::NetBSD) {
WCharType = SignedInt;
// NetBSD apparently prefers consistency across ARM targets to consistency
// across 64-bit targets.
Int64Type = SignedLongLong;
IntMaxType = SignedLongLong;
} else {
WCharType = UnsignedInt;
Int64Type = SignedLong;
IntMaxType = SignedLong;
}
LongWidth = LongAlign = PointerWidth = PointerAlign = 64;
MaxVectorAlign = 128;
MaxAtomicInlineWidth = 128;
MaxAtomicPromoteWidth = 128;
LongDoubleWidth = LongDoubleAlign = SuitableAlign = 128;
LongDoubleFormat = &llvm::APFloat::IEEEquad;
By changing the last two lines here, I was able to force the long double
type to 64 bits (the same size as a double
) on ARM64:
LongDoubleWidth = LongDoubleAlign = SuitableAlign = 64;
LongDoubleFormat = &llvm::APFloat::IEEEdouble;
This removed all reference to __extenddftf2
and (it appears) all other soft float library routines.
So far, I've tested this by:
- Successfully rebuilding the compiler
- Successfully rebuilding the Popcorn libraries (
musl
,stack transformation
, andmigration
) - Successfully building an NPB binary (CG, class A,
make-popcorn-explicit
branch) - Successfully running & migrating the binary
I believe further testing is probably needed (I'll be working with this modified version of the compiler for a while to see if any bugs pop up), but as of right now it's looking promising.
This is awesome, thanks! Once you verify that it works, you can submit a pull request or let me know if you want me to implement. Just a couple of mental notes here that I don't want to forget:
-
We should make this command-line selectable if possible, so that we can enable automatically with "-popcorn-*" but not force users to use it if they don't want. We also need to document that this is enabled with the Popcorn flags
-
This fix needs to be applied to all architectures, including PowerPC & X86 (& maybe RISCV if we go that direction)
@rlyerly I'm going to test it for another day or two, but I'll try to put together a pull request for at least the ARM changes.
Additional note: requires modifying arch/aarch64/bits/float.h
in musl
(replace every field except DECIMAL_DIG
with values from arch/arm/bits/float.h
.