TLOTD #0: Finding Assembly Mnemonic Corresponding to MachineInstr in LLVM
February 13, 2017
TLOTD, a self-made acronym for Trivial Learning of the Day, is a series of smaller posts elaborating on something presumably trivial which I figured out after much pain. Hopefully someone out there will find this useful…
$(BUILD_DIR)/lib/Target/X/XGenAsmMatcher.inc, where X = X86/ARM/AVR/whatever.
MatchTable; you’ll find two arrays you can iterate
through to find the asm mnemonic corresponding to your
You won’t be able to include
XGenAsmMatcher.inc directly because it contains
member functions of the
XAsmParser class, so you have to include the class itself only.
Note that this has been tested only for the AVR target, but it should be similar for the others.
If you’re interested in the journey, read on.
Due to a variety of reasons, I needed to find the asm mnemonic corresponding
to the MachineInstr opcode. Like for instance in the AVR architecture,
has a direct correspondence to the AVR assembly instruction
adiw, and so on.
As usual, my first line of attack was to head to the
MachineInstr doxygen page
and look for clues in its getter functions. The closest thing I found, was the
getOpcode() function, which later dashed my dreams because it only referred
to LLVM’s internally assigned opcode for the
MachineInstr, and not the
actual opcode for the AVR mnemonic.
Next, I headed over to the
MCInstrDesc class reference, since it mentioned some ‘target
descriptor’ (?) (I had no idea what that meant since I had not yet gone through the
Writing an LLVM Backend doc because
it had long prerequisite readings and I thought I could get away with having
only a high-level idea of how the backend was structured). Naturally,
I hoped the descriptor would describe something about the target.
Sadly, despite having other useful information, it still could not satisfy my needs. (“omg
how hard could it have been to have a function called
What if I grabbed the mnemonic when it gets printed by the
I could not see any straightforward way of doing so, and more importantly, I
would lose the chance of performing interesting transformations since the
AsmPrinter is the last
FunctionPass which runs in LLVM.
I then found the
AVRInstrInfo.td file which described each AVR intermediate
instruction in detail, including its assembly string! Victory was close…
However, it still hadn’t struck me how to use this, since it was just a text file and not a C++ file that I could include. I figured it was finally time to ask for help on the mailing list, hence .
The kind responder on the thread all but confirmed my suspicion that I needed to dig into the backend target definitions. It was time to go through the behemoth of a doc on backends in LLVM.
cue hour-long silence
Phew. The sun was shining and the birds were singing again. The TableGen doc
revealed that the C++ files (not exactly, just function and array definitions in C++ syntax)
AVRInstrInfo.td and the other
.td (target description)
files, were sitting pretty within
$(BUILD_DIR)/lib/Target/AVR, waiting for my perusal.
I immediately got my ass there and started furiously grep-ing through
and its siblings. I hit jackpot; I found a char array called
containing all the AVR mnemonics, and an array of structs called
MachineInstr opcodes to
I just needed to do
#include "AVRGenAsmMatcher.inc" with the appropriate
#define GET_MATCHER_IMPLEMENTATION) and I’d be done.
Unfortunately, along with the arrays I needed, there were member functions of
AVRAsmParser, which I couldn’t separate in any easy way, so I could only do
#include "AsmParser/AVRAsmParser.cpp", and couldn’t reference the required file directly.
This felt kinda hacky since it isn’t a header file and isn’t meant to be included
anywhere, but I didn’t really care at that point.
tl;dr: Spent nearly 2 days to figure out 3 lines of code.
Feast your eyes upon the glorious
MnemonicTable for the X86 target…