本文由 简悦 SimpRead 转码, 原文地址 https://code.google.com/archive/p/interactive-text-hooker/wikis/DevGuide.wiki
This article describe the process of custom hook devloping and H-code format. Reader should be familiar with x86 architecture, assembly and debuggers.
When perform attach operation, ITH will inject a DLL into target process. The DLL then insert hooks within the target process address space. Since x86 is Von Neumann architecture. Program code resides in the same address space as program data. So if coded carefully, read/write certain part of memory can change the program’s execution flow. After a hook is inserted at a certain address, whenver the program execute the instruction at that address, it will first execute hook code. ITH insert hook as follow:
-
Disassemble instruction length at the target address, make sure it’s greater than 5.
-
Copy the instructions to somewhere else for later recovery.
-
Allocate memory and fill it with hook code.
-
Modify instrution at the target process to CALL $(E8 $), where $ stands for the hook code entry address. The call instruction is 5 bytes long, if we copied more than 5 bytes, fill the rest with INT 3(CC).
-
Copy original instruction after hook code, do necessary modifications(any position relative instruction, jmp,call, etc)
and insert a JMP $(E9) at the end. Here $ stands for the next instruction at the target address.Assume 1-10 is address of 10 instruction. If insert hook at 5 and hook code at 15-20. The execution flow will be 1-4, jmp to 15, 15-20, 5, jmp back to 6, 6-10.
This technique let you intercept program execution, and do anything you want at that point. ITH applies this technique to text processing related routes to intercept text. At a certain point ITH can
-
Read all register value
-
Access most memory except the stack above current ESP.
-
Address data indicating where to insert hook.
-
Text data(whether it reside in register or at some memory location)
-
Text data processing information (Endian, Unicode, String…)
-
Split data if necessary (when text with different meaning processed at one place).
AGTH H-code is an efficient method to encode this information.
Although it’s not complete in theory, it can handle most of common cases. When text processing is too complicated to represent by H-code, an extern hook is needed. That is, we provide our own code to gather information for ITH.
H-code format
/H{A|B|W|S|Q}[N][data_offset[*data_indirect]][:split_param[*split_indirect]]@addr[:module[:name]]
/H
: Prefix for H-code.A|B|W|S|Q
: Data form. 说明N
: Disable(zero) first split parameter. By default, the first split parameter is taken from[ESP]
.data_offset
: Specify where to find data. If data_offset is positive, data is taken from [ESP+data_offset]. If data_offset is negative, it refers to registers as follow.
EAX | ECX | EDX | EBX | ESP | EBP | ESI | EDI |
---|---|---|---|---|---|---|---|
-4 | -8 | -C | -10 | -14 | -18 | -1C | -20 |
data_indirect
: If value of data_offset is a pointer, use this to take one level of indirection reference. Assume 32-bit pointer taken according to data_offset is off_data. So data is taken from [off_data+data_indirect].split_param, split_indirect
: Same to data_offset, but value is for split parameter.addr
: Absolute address or offset.module
: Target module name. Case insensitive.name
: Export function name. Case sensitive.
Some examples:
/HA-4@41F400
-> SHIFT-JIS character in EAX, 8? at AH.
/HS4@CE1F0:koisora_main.exe
-> If koisora_main is mapped at 400000, insert address is 400000+CE1F0=10E1F0. [ESP+4]
contains a SHIFT-JIS string pointer.
/HWN-10*-8:-10*10@46AE28
-> Unicode character at [EBX-8]
, split parameter at [EBX+10]
, zero default split parameter.
Most of current H-codes use absolute address(@address). It’s resolved to the correspond address in target process virtual address space. Base-offset(@offset:base) is a more flexible form. You specify a module name as base address, and an offset to add to that base. This form is necessary if you are hooking in DLL or the main module is using Address Space Layout Randomization. You can also specify a function name in the export table of target module to narrow down the address.
SHIFT-JIS
Most game engines use SHIFT-JIS encoding. SHIFT-JIS characters begins from 8140. Usually 2 bytes long. If you see a 2 byte data 8??? or 9???, it’s probably a SHIFT-JIS character. If 8? is at high address, use A, otherwise use B to represent SHIFT-JIS character
Unicode
There are also many engines using Unicode encoding. Japanese character (Hiragana and Katakata)in Unicode starts from 3000. But Kanji characters spread in a great range. Unicode characters are always 2 bytes long. Look for 2 bytes data with the form 30?? is good. Use W to represent Unicode character.
String
If you find a big block of text and is null terminated, you can use string. S for SHIFT-JIS and Q for Unicode. Note that value is a pointer of the first character. So there is already a level of indirection.
There is no fixed format for split parameter. Split parameter is used to split text stream from one hook into different threads. It’s only necessary to be different if you want to split the text.
Now I take some examples to demonstrate the whole process to build a hook.
- Font caching issue
Example BGI engine. 素晴らしき日々
Default hook gives a thread from TextOutA. Set breakpoint at TextOutA.
Called from 41F1F2. We navigate to the entry of current function – 41F1A0. Set a breakpoint there and let the process fly. We can observe that process breaks at 41F1A0 as many times as TextOutA. So we continue step out. 41F1A0 is called from 41F45A. Entry of this function is 41F400. This time process breaks at 41F400 more than TextOutA, as well as 41F1A0. At 41F400, EAX contains data we want. So the code is /HA-4@41F400. Then we discover that it mess up with some punctuation. Keep the breakpoint at 41F400 and clears all other. We can discover that ESP changes several times during one sentence, especially when the character is punctuation(for example 8141:、). So we use ESP as split parameter. H-code is /HA-4:-14@41F400. Simple test shows that this code works pretty well.