Research

Magic [probably] behind Hex-Rays

Building Advanced IDA Decompiler Plugins
Peter Matula
Author at Avast Engineering
Published
May 25, 2020
Read time
10 Minutes
Magic [probably] behind Hex-Rays
Written by
Peter Matula
Author at Avast Engineering
Published
May 25, 2020
Read time
10 Minutes
Magic [probably] behind Hex-Rays
    Share this article

    IDA has become the standard for modern disassemblers used in the reverse engineering community. Hex-Rays is a popular plugin to IDA which further simplifies the binary analysis by decompiling native code into a C-like pseudocode. However, Hex-Rays’s strength goes beyond its decompilation quality. It is its overall seamless integration with the interactive disassembler that makes it an invaluable reversing tool.

    This article demonstrates how to use the extensive, but often not self-evident, functionality provided by IDA SDK in order to put together a plugin with Hex-Rays-like capabilities. In fact, many of the possibilities discussed here are probably not used outside of Hex-Rays itself.

    When you are “grepping” GitHub hoping to find a real-world-usage example of IDA SDK function.
    When you are “grepping” GitHub hoping to find a real-world-usage example of IDA SDK function.

    The scope of this article is limited to the GUI-related features within IDA SDK, not the decompilation itself. The article expects the reader has a good knowledge of IDA plugin writing. It focuses on the advanced features it aims to introduce. The basic functionality comprehensible from IDA SDK’s headers or examples may be omitted.

    Both IDA and IDA SDK used in this article are of version 7.5. The article discusses the fundamental principles to accomplish the given tasks. The complete working example can be found in an associated GitHub repository. The repository may be improved or updated to work with further IDA SDK versions. Examples in this article will not!

    What can Hex-Rays do?

    Hex-Rays is a native IDA plugin written in C++ by the guys behind IDA itself. As such, it perfectly uses IDA SDK to gain the following capabilities:

    • Syntax highlighted C-like pseudocode in an IDA-native subview.
    Syntax highlighted C-like pseudocode in an IDA-native subview.
    • Displayed content associated with the related disassembly addresses.
    Displayed content associated with the related disassembly addresses.
    • Cursor-contex-sensitive actions.
    Cursor-contex-sensitive actions.
    • Utilization of IDA’s navigation mechanisms.
    Utilization of IDA’s navigation mechanisms.
    • Synchronization with IDA disassembly subview.
    Synchronization with IDA disassembly subview.

    Modification of IDA disassembly.

    Modification of IDA disassembly.

    Demo

    Step by step, we will write a plugin implementing all the interactions listed above.

    Only the high-level principles are discussed here, see the associated GitHub repository for the complete source code.

    1. Decompiler

    Implementing a decompiler itself is outside the scope of this article. Therefore, we are going to mock it. All we need is an interface capable of decompiling an address into a function:

    The demo plugin in the repository is able to decompile only main() and ack() functions from the provided ack.x86.gcc.O0.g.elf binary. Their decompiled code is hardcoded in the decompiler module. This is sufficient for our demonstration purposes.

    2. YX coordinates

    Before we look into function representation, we need to introduce two important building blocks. The first one being YX coordinates:

    These objects will represent a position into an IDA custom viewer. Because we are going to display a source code, we index lines (Y) from 1 and columns (X) from 0.

    3. Tokens

    Another piece essential to build functions is Token:

    It represents one lexical unit in the decompiled source code. Each such unit has a kind (e.g. ID_FNC), value (e.g. main), and an associated disassembly address (e.g. 0x8048577). See this wiki page for the detailed explanation. There is also a method returning an IDA color tag associated with the Token.

    4. Functions

    Finally, the representation of a decompiled function:

    A function is a list of tokens with a name, start address, and end address. Tokens are indexed by their [starting] YX coordinates. YX coordinates are indexed by their disassembly addresses. Apart from the simple getters, there is a bunch of methods dealing with coordinates, addresses, and lines. These will get quite handy in a moment.

    5. Places

    Places (i.e. objects derived from abstract place_t) denote locations of data displayed in viewers. IDA SDK defines several classes that could be used to display certain data. simpleline_place_t for string lines, idaplace_t for disassembly locations, hexplace_t for hex dump lines, etc.

    If none of these is suitable for the user’s application, the plugin author can implement a custom derivation of the class. In our case, we create a demo_place_t class suited for representing locations and displaying tokens of decompiled functions:

    YX is the location component, Function is the data component. Using YX, we can easily extract the information needed to implement the place_t interface from the associated Function. demo_place_t is both YX-aware and EA-aware location, something that none of the existing SDK places is.

    User defined places need to be registered. We use the PCF_EA_CAPABLE flag to indicate our place is EA-aware. This will enable some handy out-of-the-box features later. Also, since SDK 7.5, all new plugins should use PCF_MAKEPLACE_ALLOCATES flag:

    Unfortunately, place_t objects are by default considered to be just a line cursors (i.e. Y-sensitive). Special cases that are also X-sensitive need to implement the custom_viewer_adjust_place_t callback and fine-tune the X location. Otherwise, moving the cursor horizontally in a line will not change the X coordinate, nor the corresponding address:

    In cases like these, SDK also recommends to implement a callback named custom_viewer_get_place_xcoord_t. It determines if two places are on the same line, and should prevent unnecessary viewer refreshes:


     

    6. Code viewer

    Now we have everything to create a viewer displaying decompiled functions in plugins Context::run() method:

    ui_handlers is a set of custom viewer handlers:


     

    We have already discussed cv_adjust_place and cv_get_place_xcoord. The other handlers will be examined in a moment.

    With the code presented so far we get a plugin capable of displaying colored decompiled functions, reacting on both Y and X movements, and aware of disassembly addresses associated with the shown lexical units.

    7. Place conversion

    SDK allows to implement another powerful mechanism on custom places – location conversions. These are used for view synchronization, and if a custom place is PCF_EA_CAPABLE (which our is), then also for address navigation:


     

    Converter’s only job is to set the new place in the destination entry. If such a place is in the newly decompiled function, it is not enough. We also need to switch viewer’s content to this function. We do so in the location changed handler:


     

    Changing the content is as easy as setting a new viewer’s range.

    With these additions, we can now synchronize our view with IDA disassembly view.

    Also, because our view is EA-capable, IDA automatically enables some more features like goto-address navigation (G) or navigation toolbar synchronization.

    8. Contex-sensitive actions

    We demonstrate these in two different but similar scenarios. First, we want to decompile and display a new function, if we double click on its call statement. The catch is, we want to do it only for functions and not for other lexical elements:

    Second, we want to have different right-click-popup options depending on the kind of token under the cursor:


     

    In both cases, we can ask IDA what is the current viewer place. Using its YX we get the token at these coordinates in the associated function. It is then easy to use the token to make context-sensitive actions.

    9. Navigation

    Because we were using the appropriate SDK mechanisms the whole time, we get another great feature right out of the box – navigation.

    10. Synchronization highlighting

    Remember that cool green corresponding-line highlighting in the example in Section 7? Well that isn’t an out-of-the-box functionality. We implemented it using new view synchronization features brought by IDA SDK 7.5.

    It is now possible to query custom viewer’s synchronization group, and interfere in viewer’s line rendering:

    Note: This highlighting implementation works a bit differently than the one in Hex-Rays plugin. It collects and highlights all the addresses represented by the current line in our demo viewer. Hex-Rays on the other hand seems to highlight whole continuous blocks delimited by the surrounding lines.

    How to use this article?

    Well, however you wish! We have shown how to use advanced IDA SDK features related to places and movements to create a dummy decompilation plugin. Hopefully, the examples in this article, or the full example in our GitHub repository, will help you make your plugins more interactive.

    We will use the mechanisms shown here in our upcoming RetDec IDA plugin v1.0 to create a free and open decompilation plugin on par with Hex-Rays. Check out the project if you want to see a complex real-world usage example, or if you just need a free decompiler for IDA or Radare2.

    Peter Matula
    Author at Avast Engineering
    Follow us for more