llvm
llvm copied to clipboard
Bool corruption across extern and FFI calls
The LLVM bindings map the Bool
type to an i1
. This works fine (though perhaps a bit slow) within LLVM generated code but does not work with externFunction
mappings or foreign imports
in Haskell. In this example ret (valueOf False)
is compiled to xor $al
, leaving trash in upper bits of $rax
. I propose mapping Bool
to a i32
or i64
instead of an i1
.
define i1 @_fun1() {
_L1:
%0 = call i64 @malloc(i64 1)
ret i1 false
}
>>> True
define i1 @_fun2() {
_L1:
ret i1 false
}
>>> False
.section __TEXT,__text,regular,pure_instructions
.globl __fun1
.align 4, 0x90
__fun1: ## @_fun1
Ltmp1:
.cfi_startproc
## BB#0: ## %_L1
pushq %rax
Ltmp2:
.cfi_def_cfa_offset 16
movl $1, %edi
callq _malloc
xorb %al, %al
popq %rdx
ret
Ltmp3:
.cfi_endproc
Leh_func_end0:
.globl __fun2
.align 4, 0x90
__fun2: ## @_fun2
Ltmp4:
.cfi_startproc
## BB#0: ## %_L1
xorb %al, %al
ret
Ltmp5:
.cfi_endproc
Leh_func_end1:
.subsections_via_symbols
import Control.Monad (liftM2)
import Control.Monad.Trans (liftIO)
import Control.Monad.Trans.Resource
import Data.Int
import Foreign.Ptr
import LLVM.Core
import LLVM.ExecutionEngine
import LLVM.FFI.Support (disablePrettyStackTrace)
emitFunctionBroken
:: CodeGenModule (Function (IO Bool))
emitFunctionBroken = createFunction ExternalLinkage $ do
m <- externFunction "malloc"
call (m :: Function (Int64 -> IO Int64)) (valueOf 1)
ret (valueOf False)
emitFunctionWorking
:: CodeGenModule (Function (IO Bool))
emitFunctionWorking = createFunction ExternalLinkage $
ret (valueOf False)
foreign import ccall unsafe "dynamic" mkFun :: FunPtr (IO Bool) -> (IO Bool)
main :: IO ()
main = do
-- boot up LLVM
initializeNativeTarget
runResourceT $ do
m <- liftIO $ newModule
(b, w) <- liftIO $ defineModule m $
liftM2 (,) emitFunctionBroken emitFunctionWorking
(_, ee) <- allocate (createExecutionEngine m) destroyExecutionEngine
bfun <- liftIO $ fmap mkFun (getPointerToFunction ee b)
wfun <- liftIO $ fmap mkFun (getPointerToFunction ee w)
liftIO $ do
dumpValue b
print =<< bfun
dumpValue w
print =<< wfun
I'm starting to think this is a bug in GHC, since regular C/C++ code uses i1
as well for _Bool
and bool
...
Just spoke to Igloo... the Storable Bool
instance used by ffi doesn't match up with the i1
bool
used by C/C++, it should be CChar
or Word8
instead... so it's a bug in the llvm-bindings not the foreign import
code.
I think it is never correct to use a Haskell type in a C binding. It should always be CInt (~ int), CChar (~ char), Word8 (~ uint8) and so on. Unfortunately a lot of CInt's were replaced by Bool in the FFI recently. I would prefer to have a CBool type.
If you see any such corruptions can you send a patch? I agree it's the wrong thing and LLVM definitely treats its Bool (i1) type as a single bit.
The CInts that were replaced by Bool were not in the calls to functions generated by LLVM, but in the LLVM functions themselves. One can easily find them using grep: llvm-base$ fgrep -r Bool LLVM/FFI/ LLVM/FFI/Core.hsc: -> Bool -- ^ non-zero if function is varargs LLVM/FFI/Core.hsc: :: TypeRef -> IO Bool LLVM/FFI/Core.hsc: :: ModuleRef -> CString -> TypeRef -> IO Bool ...
Patches would be appreciated :thumbsup:
On Sat, 18 May 2013, Nathan Howell wrote:
If you see any such corruptions can you send a patch? I agree it's the wrong thing and LLVM definitely treats its Bool (i1) type as a single bit.
Just a note on the maintenance procedure: Someone has changed CInt to Bool between llvm-base-3.0 and llvm-base-3.2. For my taste it is the responsibility of the maintainer to critically review incoming patches. The maintainer may miss a problem, ok. But now that I bring up the issue, I want that the maintainer and the submitter of the patch become aware of it. Otherwise, who will protect the package from another patch, that again turns CInt into Bool? The submitter of the patch will certainly have written code that relies on Bool and if his code fails with a new version of llvm-base he will certainly submit the CInt->Bool patch again. I would prefer if the maintainer or the patch submitter roll-back the patch. Then I am sure that they are aware of the problem.
My remaining question is whether we should simply roll-back the patch or whether we prefer the Bool type. We could provide thin wrappers around the LLVM functions that convert between Bool and CInt. The functions would be easier to use, but we would lose the ability to distinguish different values that all mean True. If the LLVM prototypes are correct we would not lose anything, though.
We're aware of the problem. If you don't have the time or desire to roll the patch back I'll try to get it done soon.
On Sat, 18 May 2013, Nathan Howell wrote:
We're aware of the problem. If you don't have the time or desire to roll the patch back I'll try to get it done soon.
What is your opinion about the question "rollback to CInt" vs. "keep Bool, but do it right" ?
Bool
better captures the semantics of the API but it needs to be done correctly.
The erroneous uses of Bool instead of CInt in the FFI have been corrected in 2f86bd. As this has little to do with the original issue, this issue remains open.