最近和友人研究短码竞赛,用 MinGW 跑测试的时候出现了诡异的现象,在此留作记录。

由于常用的短码 OJ 均使用 TCC 编译,自然为了模拟其行为,并节省字符数,我们选择了 C90 标准(不强制要求声明函数签名)并开启了 -fno-builtin 编译选项(以忽略烦人的编译器警告)。

跑了几段测试后,黑魔法出现:

❓

一开始我们都觉得是 MinGW 犯了什么毛病,于是拉进 GDB 里一通猛调:

根据 Windows x64 calling conventionprintf函数的第二个浮点参数应当存在 $xmm1 中,调试一下却发现已经正确写入。

xmm1 register

用 GDB 跳进 printf 实现里大概翻了一下,也没有注意到什么奇怪的地方。将各种寄存器倒腾一番之后,依然百思不得其解。

友人突然想起来看看编译参数;删掉 -fno-builtin 之后,尽管多了不少警告,MinGW 突然产出了正确的结果:

❗

那么问题来了:开不开 builtin 和 printf 的输出结果有什么关系?

又到了喜闻乐见的编译原理检测时间

GCC 文档如此解释 no-builtin 选项:

-fno-builtin
-fno-builtin-function

   Don't recognize built-in functions that do not begin with
   __builtin_ as prefix.  GCC normally generates special code to
   handle certain built-in functions more efficiently; for
   instance, calls to "alloca" may become single instructions
   which adjust the stack directly, and calls to "memcpy" may
   become inline copy loops.  The resulting code is often both
   smaller and faster, but since the function calls no longer
   appear as such, you cannot set a breakpoint on those calls, nor
   can you change the behavior of the functions by linking with a
   different library.  In addition, when a function is recognized
   as a built-in function, GCC may use information about that
   function to warn about problems with calls to that function, or
   to generate more efficient code, even if the resulting code
   still contains calls to that function.  For example, warnings
   are given with -Wformat for bad calls to "printf" when "printf"
   is built in and "strlen" is known not to modify global memory.

再次仔细阅读 Windows x64 calling convention,注意到 Varargs 一节有这么一行小字:

For floating-point values only, both the integer register and the floating-point register must contain the value, in case the callee expects the value in the integer registers.

如此对比两者产生的汇编,理由就很容易想清楚了:

asm diff

显然我们在没有函数签名定义的情况下,又开启了 no-builtin 选项,编译器因而无从获取 printf 的签名信息,只能从调用推断其签名为 void printf(const char*, double),无法遵循正确的 varargs 调用规范设置通用寄存器 $rdx,导致 Windows 运行时标准库中的 printf 作为 callee 获取第二个参数时永远是全0($rdx 原值);而一旦关闭这一选项,即使不写签名,编译器也能通过内建的函数匹配机制获得完整的 printf 签名信息(int printf(const char*, ...)),便能生成正确的汇编调用设置 $rdx,结果也自然没有问题了。

C,很奇妙吧。 xDDD

Compiler Explorer 复现链接: https://godbolt.org/z/eh9WK9eqG