Bot Architecture Part 1: Threading

This is the first installment in a multi-part series about the architecture and design of XenoBot. An introduction to the series and topic is here and the next post is here.

The core code of XenoBot exists withing the memory space of the game client. This design is extremely powerful, because it allows the code to access the game's memory, functions, and content as if they are a part of the bot itself. Put simply, instead of needing to do stuff like this:

HANDLE proc = OpenProcess(PROCESS_ALL_ACCESS, 0, gamepid);

uint32_t value;
ReadProcessMemory(proc, adr, &value, sizeof(value), NULL);
WriteProcessMemory(proc, adr, &value, sizeof(value), NULL);

I can do this:

auto value = (uint32_t*)address;

This, plus the ability to hook and call the game's internal functions, can only be described as absolute power.

This isn't without issue, however. One of the major challenges was keeping XenoBot's activity thread-synced with the main thread of the game. Without synchronization, any access to memory runs the risk of polluting the game data or fetching corrupted data, since the game could be working with the memory at the same time.

With some reverse engineering, I found that the main loop of the game looks something like this:


Luckily, all of the game's work was done on a single thread somewhere in this loop, meaning that the bot code could be perfectly synchronized by hooking into this loop.

The event processing mechanism consisted of a very basic Window's API event-loop, eg:

while (PeekMessage(&msg, hwnd,  0, 0, PM_REMOVE)) 
    // process messages

Knowing this, I used an Import Address Table hook on PeekMessage to effectively create a callback at the event processing stage in the game's main loop. In this callback, I would allow the bot to do it's work whenever the original PeekMessage function returned false. This effectively lead to the main loop looking like this:


This worked magnificently for the core functionality of the bot, but it didn't support the threading model of the scripting engine. Within the scripting engine, each Lua script was given it's own thread on which to operate. These threads could receive events from the core as well as access memory and call functions within the game. Because I controlled the core and the scripting engine, sending events was no problem. However, there wasn't an efficient way to queue up function calls or memory reads from within each script thread without keeping a very large queue buffer (dozens of scripts typically run at once) of pending actions. Not only is such a solution expensive in itself, but it also incurs the cost and complexity of somehow wrapping each action into a blob that can be queued, processed, and have it's result queried on multiple threads.

As an engineer, one of the few things I love more than conquering complexity is completely avoiding it. I realized I could design the script threads to use a regular locking-mechanism, but place such a mechanism inside-out within my PeekMessage hook:

BOOL PeekMessageHk(LPMSG msg, HWND wnd, UINT fltmin, UINT fltmax, UINT rem)
    auto ret = PeekMessageOrig(msg, wnd, fltmin, fltmax, rem);
    if (!ret)
    return ret;

I used a locking-mechanism with first-come, first-serve entrancy. The result is that, when a script requires access to something which requires synchronization, it begins to wait on scriptEngine->lock. Then, as each frame comes around, the hook can vacate this lock and yield to any scripts which were waiting prior to do their work. The hook can then enter the lock again--before any new actions--and allow the game to finish the frame.

This model is very fast on both sides. The entrancy ordering means that the script threads aren't able to bully the main loop into the background, but are still able to get as much running time as needed. By also making the lock reentrant, any nested or recursive script actions can be carried out in a single-frame, but loops and other intensive flows can be interleaved into multiple frames.

Just when I thought I had conquered this problem, another nuance arose. The game had an option to limit the framerate of the client. Because many of my users were running dozens of clients at once, they'd save processing cycles by setting the framerate limit to 10. What this looked like, in terms of the game loop, was a sleep inserted after the Draw Frame portion of the code. The sleep would be something like this:

auto sleepFor = (1000 / fpsLimit) - timeThisFrameTook;
if (sleepFor > 0)

With only 10 chances to interoperate with the main thread every second, scripts were left with out-of-date state information, slow response-times, and overall lagginess. The solution was to hook the Sleep function, break each slumber into 10-millisecond chunks, and allow scripts to work between the naps:

VOID SleepHk(DWORD ms)
    auto total = ms;
    while (total > 0)
        total -= 10;

This was the final nail in the coffin for thread synchronization. It did have a marginal effect on the overall framerate due to not accounting for the processing time taken by the locking, but the extra time was so negligible that it wasn't worth writing the code to account for it.

That's all for this installment. Stick around for the next one, and thanks for reading!