llm.c
llm.c copied to clipboard
write LLVM optimization passes for train_gpt2
Here is a little example:
multiplications where one operand is a power of 2 and a constant integer, are optimized with a shift operation and the shift amount is calculated using the logBase2 of the constant.
bool optBasicStrengthReduction(Instruction &I) {
auto OpCode = I.getOpcode();
if (OpCode != Instruction::Mul) return false;
Value *Op1 = I.getOperand(0);
Value *Op2 = I.getOperand(1);
ConstantInt *CI = nullptr;
// Check if op is a constant integer and is a power of 2
auto isConstPowOf2 = [&CI](Value *op) {
return (CI = dyn_cast<ConstantInt>(op))
and CI->getValue().isPowerOf2()
and not CI->isOne();
};
if (isConstPowOf2(Op1)) std::swap(Op1, Op2);
if (not isConstPowOf2(Op2)) return false;
errs() << "Triggered train_gpt2 optimization\n";
// Shift amount calculation
unsigned ShiftAmount = CI->getValue().logBase2();
// Create a new shift instruction
Instruction *ShiftInst = BinaryOperator::Create(
Instruction::Shl,
Op1, ConstantInt::get(CI->getType(), ShiftAmount)
);
ShiftInst->insertAfter(&I);
I.replaceAllUsesWith(ShiftInst);
return true;
}
and we need to add a call to the opt in a runOnBasicBlock
function:
bool runOnBasicBlock(BasicBlock &B) {
bool globallyModified = false;
std::set<Instruction*> toBeErased;
for (auto &I : B) {
bool locallyModified =
// here you can add all your opt passes
optBasicStrengthReduction(I)
|| optExample2(I)
|| optExample3(I)
|| optExample4(I)
...
// dead code elimination
if (locallyModified) {
toBeErased.insert(&I);
globallyModified = true;
}
}
for (auto *I : toBeErased) {
I->eraseFromParent();
}
return globallyModified;
}
to apply the passes we need to convert train_gpt2
to a LLVM-IR using the clang compiler:
$ clang -emit-llvm -c train_gpt2.c -o train_gpt2.bc
#apply the opt pass
$ opt -load ./build/LocalOpts.so -local-opts train_gpt2.bc -o train_gpt2_opt.bc
#obtain the optimized train_gpt2.c
$ clang train_gpt2_opt.bc -o train_gpt2_opt
I was discussing this yesterday with @jonmasters. Ideally this would be a script that takes llm.c and transforms it into specialized but still legible C code for a particular architecture. It could do buffer size tuning etc like Mojo🔥
It would also be nice to have a memory/cache layout visualizer.
@blasty has some great human friendly inline assembler examples https://github.com/blasty/unwyze/blob/638e7d17e752a30a3e758f51e436f752954afbd4/exploit/src/main.c#L180
looking into it!