GSoC 2015: The Wrath of The Cross Compiler
I have just experienced what can only be described as “Learning how cross-compiling works in the most excruciating way possible”.
Some background: PySoy’s source code is mostly written in Genie, a language which is compiled down into C, then compiled into machine code. Genie’s binding system works like this: in the bindings file, you define the C header file the function/class/variable is being imported from, then you define what the variable name will be in Genie itself. When it comes time for the compiler to turn the Genie code into C code, it looks for any variables that were defined in the bindings, and replaces the token with the C equivalent which is also defined in the bindings.
These details are important because right now I’m trying implement ‘interception’ of all the calls made to OpenGL so they can be timestamped and recorded.
My plan was basically this: prefix the names of all the binding functions with “raw”, then have a Genie file that would have all of the normal function names implemented, which would then call the “raw” functions. When a component of PySoy calls “glAttachShader” for example, the call stack would look like this:
glAttachShader called -> glAttachShader wrapper (genie) -> raw_glAttachShader (binding)
Upon running an example using the updated bindings/wrapper, I received an unpleasant segfault. I proceeded to debug this issue for the next 3 days to no avail - it was only today that there was a revelation.
Time skip to today: I figured the issue had to do with the bindings misbehaving - perhaps having to do with the symbol naming convention being used in the bindings file. At this point, my bindings file and wrapper looked a little like this:
For some reason, everything ran perfectly. If I had the code call wrap_glAttachShader, everything worked fine. If I had the code call raw_glAttachShader directly, everything worked fine. The problem was most definately not related to naming conventions.
The only problem is that my wrapper function now had “wrap_” at the beginning of it, and the whole purpose of the wrapper in the first place is so I don’t have to go and find every opengl function reference and add “wrap” to the beginning of it! If I wanted to have the wrapper function just be “glAttachShader”, I would have to remove the first entry in the bindings above.
To make things even more irritating, if I removed the first bindings entry so there’s no symbol collision with glAttachShader, the segfaults would come back:
At this point I was convinced that the problem had to do with the great programming gods of beyond not being pleased that the call stack goes through genie an extra time before hitting the bindings. I went through the entire PySoy library and replaced all the glActiveTexture calls with raw_glActiveTexture, hoping that bypassing the bindings file would stop the segfault.
With my hair frazzled and the midterms deadline fast approaching, I fired up gdb and decided that I was going to go full shellcoder and figure out this segfault once and for all the hard way.
DO YOU SEE THAT. THIRD TO LAST LINE. I had just bypassed the wrapper completely across the entire engine, why was the segfault happening inside my wrapper file?!
And then it hit me like a bag of bricks.
When Genie compiles source code to C, it doesn’t change -any- of the symbol names. My wrapper file contains a function called glAttachShader, which was being directly translated into a C file as a function definition for glAttachShader. When the linker was combining the C file with the GLES library, it used the Genie definition of glAttachShader instead of the GLES one, meaning that any code in the GLES library that calls glAttachShader was calling my wrapper function instead of the real glAttachShader function. It’s beyond me why the linker didn’t panic and throw an error.
In order to fix this issue, I just had to tell the Genie compiler to compile my glAttachShader function using a different name in the C file.
And thus everything worked perfectly.