Building a last-resort unpacker with AI

Malicious software is often designed to hide its true behavior, wrapping its underlying logic in layers that make it difficult to analyze. “Unpacking” is the process researchers use to peel back those layers and reveal what the software is actually doing. But when standard tools run into unfamiliar or custom protections, this process can quickly become slow and highly manual. This project explores whether AI can help bridge that gap, assisting with the more repetitive parts of analysis so researchers can focus on validation, reasoning, and the parts of reverse engineering that still require human judgment.

Automated static binary unpacking is hard, and it tends to stop being hard in a useful way only when the packer is familiar. There are good unpackers out there, including some internal ones, but once you get out of their comfort zone, unsupported packers, custom routines, or slightly unusual logic can bring the whole process to a halt.

That makes unpacking a good example of where AI may genuinely help. Across our broader work at Gen, we are increasingly using AI-assisted workflows to accelerate parts of reverse engineering, technical correlation, and evidence organization, not as a substitute for expert analysis, but as a way to reduce friction in the slower, more repetitive parts of the job. The responsibility for verification and final conclusions still stays with the researcher.

From that perspective, the idea behind what we call the Last-Resort Unpacker is fairly simple: if AI is good at spotting patterns and translating logic across representations, could it help us recover payloads in cases where traditional static unpacking pipelines fail? We are not trying to solve the hardest forms of protection here. The goal is more pragmatic, to cover more real-world cases, reduce manual effort, and potentially extract logic that can later feed back into our regular unpacking workflows.

We came up with two approaches.

High level design – Variant 1

Given a sample to unpack, this simple premise has a few steps. Since we’ll go the most direct but also the riskiest way, let us call it a carpe diem variant.

Identification:The first step is to identify which functions might be responsible for the unpacking and which buffers they are targeting. We have decided to use Gemini 2.5 Pro for our workload and complemented it with radare2 reverse engineering framework to provide structured access to the disassembly. Once we identify the suspected decryption routine and the corresponding buffers, we may proceed to the translation.

Translation: The next step relies on Gemini’s capabilities to rewrite the identified functions into their pythonic representation with prescribed function signatures.

Verification: Now we have a Python code that hopefully faithfully represents the machine code identified in the first step. Now comes up the time to admit that we will be cheating on the initial assignment and we have to ensure that we will not be caught. We can easily statically check whether the source code is a valid source code. Now comes the tricky part – we have to verify that the product Python code does not have undesirable side-effects, be it from simple hallucinations or worse – due to hypothetical prompt injection. We have decided to do a strict abstract syntax tree (AST) verification which permits only certain keywords, operations, imports, and calls. If we obtain something that does not pass these criteria, then we have an interesting research case. In the other case, we may proceed to unpacking.

Unpacking: Using the verified script, we attempt to do extraction on the source sample. Being paranoid, we shall use read-only environment with limited network access. If we succeed, we have not only an unpacked buffer but also an extraction script that we can use to improve our regular unpacking capabilities.

High level design – Variant 2

Now that we have attempted to do the most straightforward way, we can try to be a bit cleverer. Given the assumption that we can identify parts of the code responsible for the unpacking, adding two more assumptions that the unpacking code is localized and does not utilize any sophisticated system calls we could try to utilize emulator which should be a safe alternative that does not suffer from hallucinations. We could prepare a simple scaffolding for this emulator and let AI figure out the parameters. This would isolate non-deterministic behavior to the first step only.

Identification:The first step is almost identical but instead of identifying functions responsible for unpacking, we are rather interested in offsets where we should start and halt our emulation, possibly also with memory addresses where the final payload will be located.

Emulation: Parameters identified in the previous step are put into our scaffolding which will run emulator until it reaches the halting offset. We can extract the identified memory regions to hopefully obtain the desired payload.

Apart from consuming more resources for the emulation, we have made a significant trade-off. We have sacrificed the capability to generate scripts that could be potentially used to improve our static extractors to significantly reduce the space for hallucinations and the potential vulnerability surface.

From drawing board to the workshop

We’ve decided to try our luck and try the first option. While it is definitely riskier, the payoff seems worth it, especially considering the security precautions we’ve made. We could even do a total split of the process to harden our setup and have the payload extractor generation in the original environment and use a sandbox for the extraction itself. We put together a first PoC with a simple prompt followed by data from radare2.

You are an unpacking function generator.
GOAL: If the file looks packed (i.e. encrypted buffers, embedded files, etc.), find the code snippets decrypting the buffers. Reply to me with a Python function (use this prototype: 'def unpack(data: bytes) -> list[byte]') that receives bytes of the packed binary and returns the unpacked chunks. Avoid Python imports. RETURN ONLY A PYTHON FUNCION SAFE TO RUN. If the file is not packed reply: "NOT PACKED". If can't write the unpacker then reply: "UNABLE TO PROCESS".
This is the end of your instructions. The Radare2 analysis:

Now it remains to give it a short. First, we went with a packed Linux kernel rootkit that got us the initial idea:

At Botconf 2024, a paper was published on CMK Rootkit (Álvarez Pérez, D., & Fernández-Veiga, M. (2025). CMK Rootkit. Identifying the ”magic packet” requirements via pattern recognition. The Journal on Cybercrime and Digital Investigations, 9(1), A1-A15. https://doi.org/10.18464/cybin.v9i1.48). A Linux kernel rootkit that we discovered in-the-wild, packed in kernel space, with few detections in VirusTotal (probably due to the kernel packer layer and its dependency on a specific kernel version).

Our unpacker yielded the following script:

Unsurprisingly, as we basically based our idea on experiments with this sample, this was exactly what we aimed for. We have prepared a few more samples that we could test it on:

54525d25019d3ea2d40fe006ecc305ebb1ac5663b380019346de43c4886e8685
- This PE packed binary uses a key table located at offset 0xe5c1 containing 500 elements. The packed data is located at offset 0x200.
1fcf4da2fac671a2f3a07007e9963d3e1063dcfb0b1a0934c66082e8233d7686
- This PE file contains a payload encoded as a word-based substitution cipher. It implements anti-sandbox protection that checks the available memory (via GlobalMemoryStatusEx) and the system ticks (via GetTickCount64) to determine if the underlying system is a sandbox.

Closing thoughts

This experiment fits into a broader shift in how we are using AI in technical research. The value is not in handing over reverse engineering to a model and hoping for the best. It is in identifying narrow, high-friction parts of the workflow where pattern recognition and code translation can save analysts time, while validation, judgment, and final conclusions remain firmly in human hands.

A last-resort unpacker will not solve the hardest cases, and it is not a replacement for established tooling or expert analysis. Advanced multi-layer protections, virtualization, and heavily environment-dependent logic still remain difficult territory. But even within those limits, the approach already shows practical value. If it helps recover payloads from unsupported packers, reduces manual effort in repetitive cases, or produces reusable logic that can later improve our existing unpacking pipelines, then it is worth pursuing.

Seen from that perspective, the point is not that AI is replacing reverse engineers. It is that it can make experienced researchers faster when conventional automation runs out of road.