TLOTD #0: Finding Assembly Mnemonic Corresponding to MachineInstr in LLVM
February 13, 2017
TLOTD, a self-made acronym for Trivial Learning of the Day, is a series of smaller posts elaborating on something presumably trivial which I figured out after much pain. Hopefully someone out there will find this useful…
Short answer:
Open up $(BUILD_DIR)/lib/Target/X/XGenAsmMatcher.inc
, where X = X86/ARM/AVR/whatever.
Search for MnemonicTable
and MatchTable
; you’ll find two arrays you can iterate
through to find the asm mnemonic corresponding to your MachineInstr
opcode.
You won’t be able to include XGenAsmMatcher.inc
directly because it contains
member functions of the XAsmParser
class, so you have to include the class itself only.
Note that this has been tested only for the AVR target, but it should be similar for the others.
If you’re interested in the journey, read on.
Relevant mailing list threads: [1], [2]
Due to a variety of reasons, I needed to find the asm mnemonic corresponding
to the MachineInstr opcode. Like for instance in the AVR architecture, AVR::ADCRdRr
has a direct correspondence to the AVR assembly instruction adc
, and AVR::ADIWRdK
corresponds to adiw
, and so on.
As usual, my first line of attack was to head to the MachineInstr
doxygen page
and look for clues in its getter functions. The closest thing I found, was the
getOpcode()
function, which later dashed my dreams because it only referred
to LLVM’s internally assigned opcode for the MachineInstr
, and not the
actual opcode for the AVR mnemonic.
Next, I headed over to the MCInstrDesc
class reference, since it mentioned some ‘target
descriptor’ (?) (I had no idea what that meant since I had not yet gone through the
Writing an LLVM Backend doc because
it had long prerequisite readings and I thought I could get away with having
only a high-level idea of how the backend was structured). Naturally,
I hoped the descriptor would describe something about the target.
Sadly, despite having other useful information, it still could not satisfy my needs. (“omg
how hard could it have been to have a function called getMnemonic()
?!”)
What if I grabbed the mnemonic when it gets printed by the AsmPrinter
?
I could not see any straightforward way of doing so, and more importantly, I
would lose the chance of performing interesting transformations since the
AsmPrinter
is the last FunctionPass
which runs in LLVM.
I then found the AVRInstrInfo.td
file which described each AVR intermediate
instruction in detail, including its assembly string! Victory was close…
However, it still hadn’t struck me how to use this, since it was just a text file and not a C++ file that I could include. I figured it was finally time to ask for help on the mailing list, hence [2].
The kind responder on the thread all but confirmed my suspicion that I needed to dig into the backend target definitions. It was time to go through the behemoth of a doc on backends in LLVM.
cue hour-long silence
Phew. The sun was shining and the birds were singing again. The TableGen doc
revealed that the C++ files (not exactly, just function and array definitions in C++ syntax)
generated by TableGen
from AVRInstrInfo.td
and the other .td
(target description)
files, were sitting pretty within $(BUILD_DIR)/lib/Target/AVR
, waiting for my perusal.
I immediately got my ass there and started furiously grep-ing through AVRGenAsmMatcher.inc
and its siblings. I hit jackpot; I found a char array called MnemonicTable
containing all the AVR mnemonics, and an array of structs called MatchTable0
,
matching MachineInstr
opcodes to MnemonicTable
indices.
I just needed to do #include "AVRGenAsmMatcher.inc"
with the appropriate #define
-guard
(#define GET_MATCHER_IMPLEMENTATION
) and I’d be done.
Unfortunately, along with the arrays I needed, there were member functions of
AVRAsmParser
, which I couldn’t separate in any easy way, so I could only do
#include "AsmParser/AVRAsmParser.cpp"
, and couldn’t reference the required file directly.
This felt kinda hacky since it isn’t a header file and isn’t meant to be included
anywhere, but I didn’t really care at that point.
tl;dr: Spent nearly 2 days to figure out 3 lines of code.
Bonus observation:
Feast your eyes upon the glorious MnemonicTable
for the X86 target…