ARM unaligned data access and floating point in Linux

I was recently getting Data Aborts on an ARM11 program that makes intensive use on unaligned data accesses. The issue was caused by unaligned floating point accesses, which were not handled by the Linux kernel. Some background on the problem follows.

ARM unaligned data access hardware support

ARM 32-bit instructions must always be word boundary aligned. Data accesses do not have this restriction. Prior to ARMv6 architecture, unaligned load and store memory accesses were treated as aligned by truncating the data address. Starting with ARMv6, unaligned word and halfword load and store data access is supported by issuing one or more memory accesses to read the required bytes transparently, albeit incurring in a potentially greater access time.

Unaligned data access is controlled through the following bits of the CR1 register of the CP15 coprocessor:

  • U bit. Unaligned data access support enabled. This bit must be set in order to enable unaligned data access support. Disabling this bit means we must either provide an unaligned data access handler (like the one provided by the Linux kernel) or our software must be compiled with unaligned data access disabled by using the corresponding compiler option.
  • A bit. Alignment fault enabled. When this bit is set, all unaligned data accesses cause a Data Abort exception, irrespective of the value of the U bit. When A and U bits are not set, legacy ARMv5 mode is enabled, where an unaligned data access is treated as aligned and the data address is truncated.

The default configuration on ARM11 and ARM Cortex-A processors is U=1 and A=0, allowing unaligned half/word data access, otherwise having a strict word alignment check. Note that an unaligned multiple word access (e.g. long long) or coprocessor data access always signal Data Abort with Alignment Fault Status Code, even when the A bit is not set. Doubleword accesses must always be four-byte aligned.

Our current compiler, gcc 4.6.3, produces code with unaligned loads by default, being not possible to disable unaligned access. Other compilers are able to produce code with unaligned data access disabled (e.g. CodeSourcery, with option –mno-unaligned-access).

ARM unaligned data access and the Linux kernel

CONFIG_ALIGNMENT_TRAP is a kernel configuration option that makes non-aligned load/store instructions be emulated in software. Recent Linux kernels enable this setting by default. In fact, it is not even possible to disable this option with menuconfig (in order to make this setting visible with menuconfig, its description needs to be updated in arch/arm/Kconfig). On ARMv6 and later, this configuration option does not affect the initialization value of the CR1 register. This setting affects the software emulation for double word unaligned access while single word accesses are taken care of by the hardware directly (given our default A/U bit settings). If we disable CONFIG_ALIGNMENT_TRAP, double word unaligned accesses result on a bus error and program crash.

In the default case, with CONFIG_ALIGNMENT_TRAP enabled, a double word unaligned access results on an unaligned access fix by the kernel. This behavior is configurable through the /proc/cpu/alignment virtual file (the kernel needs to be compiled with CONFIG_DEBUG_KERNEL in order to make it visible). The default case handling of different types is:

  •  int (32-bit). Unaligned data access is handled directly by the hardware with no kernel involvement (/proc/cpu/alignment is not affected).
  • long long (64-bit). ARM processor cores do not support 64-bit unaligned accesses, so this is handled by the Linux kernel (/proc/cpu/alignment shows a DWord increment). The kernel traps an exception and the access is simulated.
  • float (IEEE single precission, 32-bit). ARM processor cores do not support unaligned accesses to VFP hardware instructions. See below.

Unaligned floating point accesses

gcc produces hardware-enabled floating point software when setting –mfloat-abi to softfp or hard, the difference being that the former generates function calls where FP arguments are passed in integer registers (same as soft ABI). An unaligned hardware floating point access results on an exception that the Linux kernel does not trap, therefore our program segfauls. An example of this kind of access can be shown with the following code:

#include <stdio.h>

int main(int argc, char* argv[])
{
    char __attribute__ ((aligned(32))) buffer[8] = { 0 };
    float* fval_p;

    fval_p = (float*)&buffer[1];
    *fval_p = 0.1234;

    printf("\nfloat at &buf[1] is %f\n", *fval_p);

    return 0;
}

This produces a Bus error, with /proc/cpu/alignment showing:

User:            1
..
Skipped:       1
..

This means that the kernel was unable to fix the Data Abort exception that took place. This problem can be fixed by compiling our software with floating point emulation (-mfloat-abi=soft), which can be performed by the Linux kernel but is normally more efficiently done by the standard C library. This has the drawback of slower code, which can have a performance impact on software that relies heavily on floating point calculations, like scientific applications or graphics processing software. The definitive solution to this kind of abort and the one we should always aim at involves fixing our software to always access floating point data on 4-byte aligned memory.

Advertisements
This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s