Work on KVM/ARM unit tests - August

September 1, 2014 - Stanislav Nechutný - KVM/ARM, Linux, Programování

After spending the whole week trying fix a bug, that wasn't even bug anyway I have started working on VFP tests. Unfortunately I based my work on:

The flags are set if the appropriate condition has arisen, and cleared if not.

Source: ARM Assembly Language Programming - Appendix B - The Floating Point Instruction Set

and don't check it in ARM's manual for Cortex-a15. This was my really stupid mistake, because in documentation for Cortex-a15's VFP there are exceptions flags described as cumulative, so they do NOT reset to 0. So how I have created "fix" for this "bug" is described in a collapsed text with the title "How to make a bug in QEMU?".

I've decided to create a library for better testing instructions and their exceptions. Result is lib/arm/vfp.h. It contains named constants for an easy access of FPSCR, FPSID, FPEXC, MVFR0 and MVFR1 register's bits. Just load the content of status register, do and operation on value with constant (e.g. FPSCR_DZC) and compare with an expected value.

Macro functions DOUBLE_UNION and FLOAT_UNION for creating union structures, which are useful for creating special values floats/doubles as NaN, Inf, -0 etc. were also moved to this library. Created union has an element .d/.f with type double/float and second element .input with its type unsigned (long) long which is used for loading special values as DOUBLE_MINUS_INF. These special values are also defined in the .h file with names DOUBLE_* and FLOAT_*.

Most useful macro functions from this file are in my opinion certainly TEST_VFP_EXCEPTION(_ins, _pass, _result, _num1, _num2, _exceptions). Let me introduce it to you. As its name indicates it is created for testing exceptions raised by vfp instructions.

First argument _ins is string with instruction(s). More instructions should be separated via "\n" and for the last instruction it's not needed. _pass is a variable that will contain the result from the exception test. If the test fails the value shown would be 0, if its successful the value would be 1. The variable resets to 0 before executing given instructions and is increased via add, so it can be used in testing instructions as %[pass] for increasing and then comparing with your expected value. The next argument _result can be double/float and is used for returning a result from vfp instructions. From instruction is accessible as %[result] and is only for write. Given the initial value is discarded.

That's because of some weird bug in gcc 4.8.2 cross compiler for ARM from Fedora repository. At first I was trying to use "+?w?t?r" as constraint, but it shows errors in some cases like Error: VFP/Neon double precision register expected -- `fnegd s16,d18'. Not for all the tests, only for some. I have tried to use only "+w" as contraint, but without any positive effect. This bug was already reported, so I chose to use it only for write contraint - "=?w?t?r". Question marks are used in case of possibility of using double, float, or long without the need of modifying macro. That's also the reason why I chose the macro instead of a function. In C there is no function overloading so creating 3^3 functions isn't a good solution. Also using a variadic function isn't a good idea, because it will convert floats to doubles so I decided to use the macro function. After typing the first test for instructions working with single-precision register I've got the same errors. So Google and debugging result into removing "?r" from contraints for working float/double.

The next two arguments _num1 and _num2 are given to assembler code as values named %[num1] and %[num2] also with contraint "?w?t" so it can be double, or float type. They are designed for use as operands for instructions. The last argument _exceptions is a bit mask of expected exceptions. For creating are defined constants FPSCR_* so example usage is (FPSCR_IOC|FPSCR_DZC), or if exceptions are not expected FPSCR_NO_EXCEPTION.

The function at first clears FPSCR's cumulative exception register, set 0 to _pass and executes given instruction(s) and after it loads a FPSCR register to r0 register, it uses ‘and’ with FPSCR_CUMULATIVE constant to clear other bits and compare with expected exceptions. Add instruction with condition then increase value in _pass variable. In given instructions it is possible to use %[pass] for custom step-by-step tests.

The next problem was with GCC's optimization. After creating a single-precision variant of double-precision tests begin showing a compile error. Lots of "VFP/Neon double precision register expected" errors in tests previously working. "faddd %[result], %[num1], %[num2]" was converted into 'faddd s14,d16,d16' which is really weird. %[result] is double and num1 and num2 are different variables with different values, but gcc pass them as the same register. The solution is to use a P modifier for registers in instructions. So instructions expecting double register operand get e.g. %P[num1]. It fixes this problem. Or another solution is to change the GCC's agument -O2 to -O1 (set lower optimization).

For the testing of VFP status flags I've created a macro TEST_VFP_STATUS_FLAGS which is the same as TEST_VFP_EXCPETION with only a little difference - FPSCR register is anded with FPSCR_STATUS_FLAGS and instead of _exceptions is argument _flags. This macro is used for testing fcmp[ds].

BONUS: How to make bug in QEMU?

If you haven't done it before, then please read first the beginning of this post explaining why it is quite bad.

In the last month I've found a bug in dividing. After more investigation and debugging via fprintf I have revealed, that it's bug is in not clearing FPSCR flags. So first you call for an example fdiv with calculation 1.0/0.0 and get status flags set to 0 and set dividing by zero flag. Second call with 1.0/1.0 doesn't get nulled FPSCR, but with set bits from previous call. This behavior doesn't correspond with the real VFP. My bad was, that I've tested only second call on real VFP.

The first hackfix was just inserting STATUS(float_exception_flags) = 0; on first line in QEMU's function float(32|64|128)_div (file fpu/softfloat.c), but it's not a correct solution, because div is used also for other calculations and I'm expecting, that this bug is also in other instructions like fmul, fsqrt etc.

So the next step for this fix was a search from which code is called this function and before calling reset status flags. Of course it can't be done for all VFP instructions, because for example fmrx is reading this flags so we can't clear them before reading.

Just grep "float64_div" was unsuccessful try. In QEMU code is lot of macros for assembling functions names. So the next step was to use C's backtrack function. Add -rdynamic to the linker and full rebuild QEMU. Don't forget -j16 argument for make on 8 core machine or it takes too much time. Unfortunately this solution doesn't work. Output was just:

Obtained 2 stack frames.
~/git/qemu/aarch64-softmmu/qemu-system-aarch64(float64_div+0x5e) [0x7f6ed5807f6e]
[0x7f6eca072dea]

Not very helpful. So let's try to use a depcg.sh script from the school project for Operating Systems subject. It uses objdump's disassembler output and generates function call flow. Surprising output was

$./depcg.sh -r float64_div ../../../qemu/aarch64-softmmu/qemu-system-aarch64
recip_estimate -> float64_div
recip_sqrt_estimate -> float64_div

Function isn't called via callq. Looks like a challenge. Next turn was for GDB. Set breakpoint on fpu/softfloat.c:float_div and run with test flat file. (gdb) bt and voalá "in code_gen_buffer ()". code_gen_buffer is void pointer pointing to function with 2 void pointers as arguments and returning another pointer :-), that's the reason why it can't be found with previous methods.

After the whole day of searching I finally found the right solution. In QEMU's file target-arm/translate.c in function disas_vfp_insn insert to if statement for data processing:

set_float_exception_flags(0, &env->vfp.standard_fp_status);
set_float_exception_flags(0, &env->vfp.fp_status);

We want the reset status flags only before data processing instructions (dyadic and monadic arithmetic instructions or comparisons), because it doesn't make sense to reset flags before using for example fmrx instruction. Why are these lines two? Because QEMU is using two fp_status variables. Why? Only author and maybe God know this. Detecting flags is done on ORed value so we must reset both.

Blog.nechutny.net

Work on KVM/ARM unit tests - August

BONUS: How to make bug in QEMU?

Jinde