Radhika Ghosal


TLOTD #0: Finding Assembly Mnemonic Corresponding to MachineInstr in LLVM

TLOTD, a self-made acronym for Trivial Learning of the Day, is a series of smaller posts elaborating on something presumably trivial which I figured out after much pain. Hopefully someone out there will find this useful…

Short answer:

Open up $(BUILD_DIR)/lib/Target/X/XGenAsmMatcher.inc, where X = X86/ARM/AVR/whatever. Search for MnemonicTable and MatchTable; you’ll find two arrays you can iterate through to find the asm mnemonic corresponding to your MachineInstr opcode. You won’t be able to include XGenAsmMatcher.inc directly because it contains member functions of the XAsmParser class, so you have to include the class itself only.

Note that this has been tested only for the AVR target, but it should be similar for the others.

// Assuming your file is in the `lib/Target/X` directory.
#include "AsmParser/XAsmParser.cpp"
#include <utility>
#include <unordered_map>
// ...
    std::unordered_map<uint16_t, StringRef> OpMnemMap;

    // ...
    const MatchEntry *Start, *End;
    Start = std::begin(MatchTable0);
    End = std::end(MatchTable0);

    for (auto i1 = Start; i1 != End; ++i1) {
        const MatchEntry *i = i1;
        StringRef MnemStr = i->getMnemonic();
        //outs() << MnemStr << "\n";
        OpMnemMap.insert(std::make_pair(i->Opcode, MnemStr));
        // ...
    // ...

If you’re interested in the journey, read on.

Relevant mailing list threads: [1], [2]

Due to a variety of reasons, I needed to find the asm mnemonic corresponding to the MachineInstr opcode. Like for instance in the AVR architecture, AVR::ADCRdRr has a direct correspondence to the AVR assembly instruction adc, and AVR::ADIWRdK corresponds to adiw, and so on.

As usual, my first line of attack was to head to the MachineInstr doxygen page and look for clues in its getter functions. The closest thing I found, was the getOpcode() function, which later dashed my dreams because it only referred to LLVM’s internally assigned opcode for the MachineInstr, and not the actual opcode for the AVR mnemonic.

Next, I headed over to the MCInstrDesc class reference, since it mentioned some ‘target descriptor’ (?) (I had no idea what that meant since I had not yet gone through the Writing an LLVM Backend doc because it had long prerequisite readings and I thought I could get away with having only a high-level idea of how the backend was structured). Naturally, I hoped the descriptor would describe something about the target.

Sadly, despite having other useful information, it still could not satisfy my needs. (“omg how hard could it have been to have a function called getMnemonic()?!”)

What if I grabbed the mnemonic when it gets printed by the AsmPrinter? I could not see any straightforward way of doing so, and more importantly, I would lose the chance of performing interesting transformations since the AsmPrinter is the last FunctionPass which runs in LLVM.

I then found the AVRInstrInfo.td file which described each AVR intermediate instruction in detail, including its assembly string! Victory was close…

However, it still hadn’t struck me how to use this, since it was just a text file and not a C++ file that I could include. I figured it was finally time to ask for help on the mailing list, hence [2].

The kind responder on the thread all but confirmed my suspicion that I needed to dig into the backend target definitions. It was time to go through the behemoth of a doc on backends in LLVM.

cue hour-long silence

Phew. The sun was shining and the birds were singing again. The TableGen doc revealed that the C++ files (not exactly, just function and array definitions in C++ syntax) generated by TableGen from AVRInstrInfo.td and the other .td (target description) files, were sitting pretty within $(BUILD_DIR)/lib/Target/AVR, waiting for my perusal.

I immediately got my ass there and started furiously grep-ing through AVRGenAsmMatcher.inc and its siblings. I hit jackpot; I found a char array called MnemonicTable containing all the AVR mnemonics, and an array of structs called MatchTable0, matching MachineInstr opcodes to MnemonicTable indices.

I just needed to do #include "AVRGenAsmMatcher.inc" with the appropriate #define-guard (#define GET_MATCHER_IMPLEMENTATION) and I’d be done.

Unfortunately, along with the arrays I needed, there were member functions of AVRAsmParser, which I couldn’t separate in any easy way, so I could only do #include "AsmParser/AVRAsmParser.cpp", and couldn’t reference the required file directly. This felt kinda hacky since it isn’t a header file and isn’t meant to be included anywhere, but I didn’t really care at that point.

tl;dr: Spent nearly 2 days to figure out 3 lines of code.

Bonus observation:

Feast your eyes upon the glorious MnemonicTable for the X86 target…