Radhika Ghosal

<--

Writing a MachineFunctionPass in LLVM

I’ve been hacking on LLVM lately and I recently needed to write a MachineFunctionPass to analyze some IR instructions while they got converted to assembly, since I was working with machine-dependent representations in LLVM as opposed to machine-independent IR.

Unfortunately, LLVM’s splendid Writing an LLVM Pass doc (which has a great introduction to IR-level passes), didn’t fully cover how to write a MachineFunctionPass (or rather, get it running), at least not well enough for noobs like me to understand. This mailing list thread was invaluable for me to get started off, but I’ll elaborate a bit more.

The post below assumes you’ve read and understand the Writing an LLVM Pass doc. It also assumes you have a basic familiarity with the kind of tools LLVM offers out-of-the-box.


LLVM’s opt tool doesn’t make any machine-dependant optimizations and only runs on IR, so it makes sense why MachineFunctionPasses don’t work in opt, since they only run on MachineInstrs (if this doesn’t make sense to you, check out Eli Bendersky’s Life of an instruction in LLVM post).

The cool part of opt is that you can write a pass out-of-source and choose to dynamically load it as a shared object library into opt without recompiling the entire opt tool, which takes a shit-ton of time. Unfortunately, there is no such nice modular way to write machine-dependent passes for llc. You simply need to hack LLVM’s source to get llc to run your MachineFunctionPass when you invoke it for the architecture you’re working on.

Enough talk, let’s dive in!


So, let’s say I want to write a MachineFunctionPass dumping the MachineInstrs in each MachineFunction. Let’s call our file X86MachineInstrPrinter.cpp living in lib/Target/X86.

Whenever you start navigating a codebase as intimidating as LLVM’s, you often wonder, “How the heck do you figure out how xyz works without documentation?”. The clichéd answer is simply, read the source. I ended up making friends with grep -nr "[search term]" . and ctags, and life got a bit better.

You might want to go through the LLVM Target-Independent Code Generator (and optionally, the Machine IR (MIR) Format Reference Manual) doc now.

The crucial learning from the first link is that all optimizations on machineinstrs are in the form of MachineFunctionPasses. If you follow the LLVM Reviews page (and you should!), try to get hold of some review process involving such an optimization. The diffs should give you an idea of the additions you need to make to get your stuff working (and shhh, find some sample code). My helper link was this and sample file was lib/Target/X86/X86EvexToVex.cpp.

Moving on, add the following to X86.h:

FunctionPass *createX86MachineInstrPrinter();
void initializeX86MachineInstrPrinterPass(PassRegistry &);

Then for lib/Target/X86/X86TargetMachine.cpp, add the snippet below. Note that we’ll be added our pass under the addPreRegAlloc() function because we’ll choose to print our machineinstrs before register allocation takes place).

extern "C" void LLVMInitializeX86Target() {
    // ...
    PassRegistry &PR = *PassRegistry::getPassRegistry();
    // ...
    initializeX86MachineInstrPrinterPass(PR);
}
// ...    

void X86PassConfig::addPreRegAlloc() {
    if (getOptLevel() != CodeGenOpt::None) {
    // ...
    }
    // ...
    addPass(createX86MachineInstrPrinter());
}

Finally, add X86MachineInstrPrinter.cpp to the CMakeLists.txt in lib/Target/X86, and compile llvm from your build directory. If you have a computer like mine, you might want to get a cup of coffee despite the fact you just need to recompile llc.

The next time you run llc, you’ll see your machineinstrs being outputted. How exciting!


Phew, that was long. Hopefully, this gave you some insight into how does one go about figuring out a large codebase like LLVM without losing one’s mind or going in too deep looking for reasoning (“To make an apple pie from scratch, you must first invent the universe.” - Carl Sagan).


Bonus observation:

You might notice that all the createXYZ() and initializeXYZPass() functions follow the same naming scheme. You might think this is just a convention; why not try changing one of them when you define them in X86MachineInstrPrinter.cpp? You’ll be greeted with a bunch of cryptic error messages.

To answer them, take a look at /include/llvm/PassSupport.h. And holy shit, the entire file is full of giant macros in the wild with the all the function names hardcoded in…

#define INITIALIZE_PASS(passName, arg, name, cfg, analysis)                    \
  static void *initialize##passName##PassOnce(PassRegistry &Registry) {        \
    PassInfo *PI = new PassInfo(                                               \
        name, arg, &passName::ID,                                              \
        PassInfo::NormalCtor_t(callDefaultCtor<passName>), cfg, analysis);     \
    Registry.registerPass(*PI, true);                                          \
    return PI;                                                                 \
  }                                                                            \
  LLVM_DEFINE_ONCE_FLAG(Initialize##passName##PassFlag);                       \
  void llvm::initialize##passName##Pass(PassRegistry &Registry) {              \
    llvm::call_once(Initialize##passName##PassFlag,                            \
                    initialize##passName##PassOnce, std::ref(Registry));       \
  }
// etc...

shudder