Why We Migrated from LuaJIT to Lua 5.2

By Chase Carlson

Hey there. Welcome back to the Realmblog. Today we’ll discuss how and why we moved the Wayrealm scripting engine from LuaJIT to regular Lua 5.2.4. We’ll talk about how we got around the limited C++ exception interoperability between Lua and C++.

Why LuaJIT Is Off Limits

We had decided we wanted to migrate our game scripting engine from a LuaJIT runtime to a regular, ANSI-C Lua runtime. LuaJIT’s merits stem from its use of DynASM, their runtime assembler that generates assembly on-the-fly from Lua source code. This ability to generate assembly on-the-fly is what makes LuaJIT JIT and allows it to reach performance on par with native C/C++ code despite being a scripting language. This is why we love LuaJIT! However, this is also the reason behind why we can’t use it.

The scripting engine for Wayrealm runs untrusted user code from user-generated games, meaning that our Lua VM must be airtight. There is no room for error lest we expose our servers and players’ machines to an RCE (remote code execution) vulnerability. Because of LuaJIT’s capability to generate arbitrary assembly at runtime, we can’t guarantee absolute security while using it. This is exacerbated by the fact that the LuaJIT developers claim that “the only promising approach is to sandbox Lua code at the process level and not the VM level.”

Additionally, if we hope to get a web build up and running in the future, it will be impossible to do with the dynamic assembly of LuaJIT. WebAssembly and Emscripten cannot support dynamic assembly for the same reasons we can’t: security.

Why Lua 5.2.4?

For a long time, Lua 5.1.5 (the final version of Lua 5.1) was the version we had in mind to migrate to because LuaJIT was based upon it, because of its stability, and because of its lack of bloat. However, there was one non-negotiable feature that we needed: a fully resumable VM. Wayrealm required the use of a fully resumable VM such that we could yield the script in certain cases, like when the user called wait(secs). The program’s execution couldn’t be interrupted, so we couldn’t use an OS level sleep: we had to yield from Lua and later resume the thread after secs had elapsed. A fully resumable VM was not present in Lua 5.1 (except for third-party patches by the creator of LuaJIT, Mike Pall, such as Resumable Vm Patch and Coco - True C Coroutines). A fully resumable VM was added to the language itself in Lua 5.2.

Exception Handling

Lua is written in C, which means it uses longjmps for error handling: therein lies the rub. Longjmps do not properly unwind the stack as is necessary in C++ to ensure all destructors are called, causing memory leaks. There is a simple solution to this: if compiled as C++, Lua will use C++ exceptions instead of longjmps. Now, the stack is properly unwound, calling destructors as it unwinds. This solves our problem with exceptions, right?

No.

Our C++ codebase makes heavy use of exceptions, often in functions that get called by the Lua API! When you call a C++ function from Lua, it jumps to your native code, then right back to the Lua execution. This means that if the C++ function called from Lua throws an exception (or any function it calls throws an exception), the exception will bubble up through the Lua VM, skipping right through any pcalls that may exist, and crashing the entire script, or even the program itself.

I solved this by modifying our fork of Lua to allow us to set a wrapper function that gets called in luaD_precall if set, where normally the function itself would get called directly. This wrapper function allowed us to wrap the lua_CFunction invocation in a try/catch block that caught the exception and rethrew it as a regular Lua error. The core of the patch is this modification to luaD_precall in ldo.c:

1
2
3
4
if (G(L)->cfunction_wrapper)
n = G(L)->cfunction_wrapper(L, f); /* pass the lua_CFunction to the user-defined wrapper function */
else
n = (*f)(L); /* do the actual call, no wrapper provided */