Using Docker to cross-compile embedded software

Are you tired of managing the installation of different cross-development toolchains on the same machine, fixing issues when your compiler does not work after a host OS upgrade or having to deal with the same toolchain being installed in heterogeneous environments?

Docker fixes some of these issues by providing a light-weight virtualization layer that isolates the cross-development toolchains from the host OS, allows the easier coexistence of different tools in the same machine, and facilitates their management and deployability.

We have been facing these problems while developing software for an Infineon XMC4800 microcontroller on a Linux host, and have improved our process by using a Docker cross-compilation container with the following features:

  • Docker container based on Ubuntu 18.04 LTS
  • GNU ARM toolchain
  • Infineon XMC libraries for XMC4800
  • Segger JLink tool for target flashing and debugging
  • Container compiles code from the host invocation directory
  • Use of ccache to speed up subsequent compilations

This is our resulting Dockerfile:

# Root image built from LTS ubuntu in Docker Hub.
FROM ubuntu:18.04

MAINTAINER Juan Solano "jsm@jsolano.com"

# Update this variable to force a refresh of all base images and make
# sure subsequent commands do not use old cache versions.
ENV REFRESHED_AT 2018-11-26

ARG USERNAME="docker"
ARG USERGROUP="dckrgroup"
ARG DEBIAN_FRONTEND=noninteractive
# These can be overriden with a command line option when the image is
# built, e.g. --build-arg UID=$(id -u) --build-arg GID=$(id -g).
ARG UID=1000
ARG GID=1000
ARG GCC_ARM_TOOLCHAIN_VER="gcc-arm-none-eabi-7-2018-q2-update"
ARG GCC_ARM_TOOLCHAIN_URL="https://developer.arm.com/-/media/Files/downloads/gnu-rm/7-2018q2/"$GCC_ARM_TOOLCHAIN_VER-linux.tar.bz2
ARG XMC_LIB_VER="XMC_Peripheral_Library_v2.1.18"
ARG XMC_LIB_URL="http://dave.infineon.com/Libraries/XMCLib/"$XMC_LIB_VER.zip
ARG JLINK_VER="JLink_Linux_V634g_x86_64"

# Set up the compiler path and other container environment variables.
ENV PATH $PATH:/home/$USERNAME/opt/$GCC_ARM_TOOLCHAIN_VER/bin
ENV GCC_ARM_TOOLCHAIN_VER $GCC_ARM_TOOLCHAIN_VER
ENV GCC_COLORS="error=01;31:warning=01;35:note=01;36:caret=01;32:locus=01:quote=01"
ENV USB_SCRIPT="usbdev_allow.sh"
ENV TZ=Europe/Berlin

RUN apt-get update -q \
    && apt-get install --no-install-recommends -y apt-utils \
    && apt-get install --no-install-recommends -y vim make sudo \
       tzdata libncurses5 ca-certificates unzip bzip2 libtool ccache \
       usbutils libusb-1.0-0-dev libusb-dev \
    && rm -rf /var/lib/apt/lists/*

# Set timezone and standard user.
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime \
    && echo $TZ > /etc/timezone \
    && groupadd --gid $GID $USERGROUP \
    && useradd -m -u $UID -g $GID -o -s /bin/bash $USERNAME \
    && echo "root:root" | chpasswd \
    && echo "$USERNAME:$USERNAME" | chpasswd \
    && usermod -a -G 20 $USERNAME \
    && adduser $USERNAME sudo \
    && echo '%sudo ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers

# Set up a build tools directory.
RUN mkdir -p /home/$USERNAME/opt
WORKDIR /home/$USERNAME/opt
RUN chown $USERNAME /home/$USERNAME/opt \
    && cd /home/$USERNAME/opt

# Install JLink as root, before changing to standard user.
COPY $JLINK_VER.deb /home/$USERNAME/opt
RUN dpkg -i $JLINK_VER.deb \
    && rm $JLINK_VER.deb
COPY $USB_SCRIPT /home/$USERNAME/opt
RUN chmod +x /home/$USERNAME/opt/$USER_SCRIPT

# Further operations as standard user.
USER $USERNAME

# Install the XMC library.
COPY $XMC_LIB_VER.zip /home/$USERNAME/opt
RUN unzip $XMC_LIB_VER.zip \
    && rm $XMC_LIB_VER.zip

# Install the ARM cross-compilation toolchain.
COPY $GCC_ARM_TOOLCHAIN_VER-linux.tar.bz2 /home/$USERNAME/opt
RUN bunzip2 $GCC_ARM_TOOLCHAIN_VER-linux.tar.bz2 \
    && tar xvf $GCC_ARM_TOOLCHAIN_VER-linux.tar \
    && rm $GCC_ARM_TOOLCHAIN_VER-linux.tar

# Required so that ccache files are kept in shared work directory.
RUN cd /usr/lib/ccache \
    && sudo ln -s ../../bin/ccache arm-none-eabi-gcc
ENV PATH /usr/lib/ccache:$PATH

# Create a directory for our project and setup a shared work directory.
RUN mkdir -p /home/$USERNAME/project
WORKDIR /home/$USERNAME/project
VOLUME /home/$USERNAME/project
RUN cd /home/$USERNAME/project \
    && mkdir -p $HOME/.ccache \
    && echo "cache_dir = $HOME/project/.ccache" >> \
       $HOME/.ccache/ccache.conf

Initially we added wget commands to the Dockerfile, so that the tools were directly downloaded before usage, but we have later decided to keep a local copy of our tools to speed up the Docker image creation. After creating the Docker image, compiling is just a matter of going to the directory where our source code lives and executing our make alias, which can be defined like e.g.:

alias xmcmake='docker run --rm -it --device=/dev/bus/usb --volume=$(pwd):/home/docker/project docker-arm-xmc make'

This starts a container based on the previously created docker-arm-xmc image, allowing access to the JLink usb port from inside the container, and executes the make command. After the make command is executed, the container exits and we can see our compiled binaries as well as a directory with the .ccache artifacts which will be used the next time the make command is invoked.

In subsequent posts, I will delve into additional development steps that can be realized with the help of this container. I hope you find this useful.

Real-time software with MicroPython

MicroPython has evolved recently to become a great platform for quick prototyping of microcontroller software. The expressiveness of the Python language and its rich library ecosystem make it extremely useful and allow the testing of embedded software in a matter of minutes.

However, when a real-time solution is needed, care has to be taken as there are inherent limitations imposed by the Python virtual machine. MicroPython tries to work around the lack of memory and the low speed of microcontrollers and is designed with these constrains in mind.

There are ways to maximize the speed at which MicroPython runs. Our tests are simple as they do not involve complex memory allocations, garbage collection or asynchronous programming. We are interested in exhibiting just the limitations of the MicroPython virtual machine and unavoidable architectural issues like cache loading when jumping to code stored in flash. These quick tests provide us with a rough idea of the performance that can be reached when accessing the GPIOs on some of the MicroPython reference boards. These figures are obtained in a system with no additional CPU load, without mounting the mass storage device and trying different code emitters.

GPIO toggling

A line of MicroPython code takes a few μsecs to execute on a PyBoard @168MHz. Toggling a GPIO pin should of course be faster than most instructions. Tests have been performed using the following code:


from pyb import Pin

p_out = Pin('X3', mode=Pin.OUT_PP)
while True:
    p_out(1)
    p_out(0)
    p_out(1)
    p_out(0)

An oscilloscope can easily show the time it takes to execute the GPIO set/clear functions as well as the time taken by the loop branch. The results are quite stable, with no appreciable jitter:

code emitter loop branch (μs) GPIO set/clear (μs)
bytecode 5.2 2.7
native 3.4 2.3
viper 0.35 2.2

These tests have also been performed on a Nucleo F429ZI board @180MHz where we observe slightly worse values (GPIO set instructions take around 10% longer).

GPIO external interrupts

We measure the time it takes for the PyBoard to set an output GPIO pin in response to an interrupt triggered on an input GPIO. The source of interrupts is a pulse generator that triggers an interrupt every millisecond. The following code has been used to measure the interrupt latency with the help of an oscilloscope:

from pyb import Pin

def gpio_cb(e):
    p_out(1)
    p_out(0)

p_out = Pin('X3', mode=Pin.OUT_PP)
ExtInt(Pin('X4'), ExtInt.IRQ_RISING, Pin.PULL_DOWN, gpio_cb)

The interrupt latency results on a PyBoard v1.0 @168MHz are stable and exhibit low jitter, with these results obtained after running for one hour:

code emitter interrupt latency (μs) interrupt jitter (μs)
bytecode 7.3 < 5.0
native 5.7 < 4.0
viper 3.3 < 4.0

The signal was captured with an oscilloscope running in persistent mode. The constant execution time of the GPIO pin toggling instructions gives us confidence that the observed variability is due only to the interrupt jitter.

Tests have also been performed in a Nucleo F429ZI board @180MHz, where we observe a slightly worse performance (GPIO interrupt latency is similar with bytecode but around 20% worse with native and viper emitters). However, the Nucleo board shows a much more stable latency value, with jitter bounded at 2μs.

We have written some code that produces an histogram of the measured latency variability.

Shall we use MicroPython in our real-time projects?

MicroPython is mature, elegant, and offers great productivity advantages over other embedded software environments. MicroPython is quite capable of running real-time code as long as we put some care around its limitations; after all we have to remember that real-time is not about being fast but about being deterministic. Providing the memory/speed resources are sufficient and we can meet our real-time deadlines, it is a great platform which we intend to use more and more in the future.

Industrial Beaglebone Black anyone?

Is it really possible to use the Beaglebone Black in industrial embedded projects? or it is just a maker/hobbyist platform?

I have been searching for an industrial version of the Beaglebone Black in order to leverage the great know-how and resources available on this great open source hardware platform to make it work reliably on industrial environments. A professional version would allow e.g. better platform longevity planning, reliability, customization or industrial temperature ranges. Last year there were some efforts that did not get anywhere.

The Beaglecore project in kickstarter seems like a great attempt at remedying this situation. It is a system on module (SOM) fully compatible with the existing Beaglebone Black, with features aimed at industrial computing and long-term availability. Help them achieve their goals!

Developing a PPS GPIO generator driver on a Beaglebone Black

While working on PTP (Precision Time Protocol, IEEE 1588) with some Beaglebone Black boards, I needed a way of comparing the time on different boards with high resolution (10ns). I saw that there is no currently a PPS generator that uses GPIO pins. There is only an option to use a parallel port.

This was a good excuse to write a kernel driver. I started by modifying the parallel port PPS generator implemented in the kernel to use a GPIO pin instead. You can use this module on other boards providing you modify your device tree file accordingly. This module has been tested with kernel 3.15.3.

The source is at: https://github.com/jsln/pps-gen-gpio

ARM unaligned data access and floating point in Linux

I was recently getting Data Aborts on an ARM11 program that makes intensive use on unaligned data accesses. The issue was caused by unaligned floating point accesses, which were not handled by the Linux kernel. Some background on the problem follows.

ARM unaligned data access hardware support

ARM 32-bit instructions must always be word boundary aligned. Data accesses do not have this restriction. Prior to ARMv6 architecture, unaligned load and store memory accesses were treated as aligned by truncating the data address. Starting with ARMv6, unaligned word and halfword load and store data access is supported by issuing one or more memory accesses to read the required bytes transparently, albeit incurring in a potentially greater access time.

Unaligned data access is controlled through the following bits of the CR1 register of the CP15 coprocessor:

  • U bit. Unaligned data access support enabled. This bit must be set in order to enable unaligned data access support. Disabling this bit means we must either provide an unaligned data access handler (like the one provided by the Linux kernel) or our software must be compiled with unaligned data access disabled by using the corresponding compiler option.
  • A bit. Alignment fault enabled. When this bit is set, all unaligned data accesses cause a Data Abort exception, irrespective of the value of the U bit. When A and U bits are not set, legacy ARMv5 mode is enabled, where an unaligned data access is treated as aligned and the data address is truncated.

The default configuration on ARM11 and ARM Cortex-A processors is U=1 and A=0, allowing unaligned half/word data access, otherwise having a strict word alignment check. Note that an unaligned multiple word access (e.g. long long) or coprocessor data access always signal Data Abort with Alignment Fault Status Code, even when the A bit is not set. Doubleword accesses must always be four-byte aligned.

Our current compiler, gcc 4.6.3, produces code with unaligned loads by default, being not possible to disable unaligned access. Other compilers are able to produce code with unaligned data access disabled (e.g. CodeSourcery, with option –mno-unaligned-access).

ARM unaligned data access and the Linux kernel

CONFIG_ALIGNMENT_TRAP is a kernel configuration option that makes non-aligned load/store instructions be emulated in software. Recent Linux kernels enable this setting by default. In fact, it is not even possible to disable this option with menuconfig (in order to make this setting visible with menuconfig, its description needs to be updated in arch/arm/Kconfig). On ARMv6 and later, this configuration option does not affect the initialization value of the CR1 register. This setting affects the software emulation for double word unaligned access while single word accesses are taken care of by the hardware directly (given our default A/U bit settings). If we disable CONFIG_ALIGNMENT_TRAP, double word unaligned accesses result on a bus error and program crash.

In the default case, with CONFIG_ALIGNMENT_TRAP enabled, a double word unaligned access results on an unaligned access fix by the kernel. This behavior is configurable through the /proc/cpu/alignment virtual file (the kernel needs to be compiled with CONFIG_DEBUG_KERNEL in order to make it visible). The default case handling of different types is:

  •  int (32-bit). Unaligned data access is handled directly by the hardware with no kernel involvement (/proc/cpu/alignment is not affected).
  • long long (64-bit). ARM processor cores do not support 64-bit unaligned accesses, so this is handled by the Linux kernel (/proc/cpu/alignment shows a DWord increment). The kernel traps an exception and the access is simulated.
  • float (IEEE single precission, 32-bit). ARM processor cores do not support unaligned accesses to VFP hardware instructions. See below.

Unaligned floating point accesses

gcc produces hardware-enabled floating point software when setting –mfloat-abi to softfp or hard, the difference being that the former generates function calls where FP arguments are passed in integer registers (same as soft ABI). An unaligned hardware floating point access results on an exception that the Linux kernel does not trap, therefore our program segfauls. An example of this kind of access can be shown with the following code:

#include <stdio.h>

int main(int argc, char* argv[])
{
    char __attribute__ ((aligned(32))) buffer[8] = { 0 };
    float* fval_p;

    fval_p = (float*)&buffer[1];
    *fval_p = 0.1234;

    printf("\nfloat at &buf[1] is %f\n", *fval_p);

    return 0;
}

This produces a Bus error, with /proc/cpu/alignment showing:

User:            1
..
Skipped:       1
..

This means that the kernel was unable to fix the Data Abort exception that took place. This problem can be fixed by compiling our software with floating point emulation (-mfloat-abi=soft), which can be performed by the Linux kernel but is normally more efficiently done by the standard C library. This has the drawback of slower code, which can have a performance impact on software that relies heavily on floating point calculations, like scientific applications or graphics processing software. The definitive solution to this kind of abort and the one we should always aim at involves fixing our software to always access floating point data on 4-byte aligned memory.

Embedded Linux and ARM

Linux usage is growing enormously in embedded systems, thanks to its stability, being open source, the availability of drivers for a huge amount of hardware peripherals and its support for many networking protocols and filesystems. However, Linux exhibits some drawbacks in safety systems, where the code needs to be certified, or hard real-time systems where deadlines are critical.

Nowadays, some Linux installs in embedded systems have been deployed following a top-down approach, where no much care has been taken to remove unused software. This may have security implications, resulting also on code bloating and maintenance problems down the line of a software product lifecycle. I recommend following a bottom-up approach, where we control precisely the software installed in our systems. This helps in the long run with easier maintainability, and better security.

Why is ARM the dominant architecture on embedded systems? ARM follows a fabless model, with licensees competing with each other on SoCs that include an ARM core and a number of extensions. This model, together with the efficiency and elegancy of their design has made them number one, especially in power-conscious designs like mobile phones.

It is becoming very easy to port Linux to new hardware devices on X86, MIPS and ARM platforms. This is a list of popular ARM development platforms with ARM cores containing an MMU and therefore can be leveraged with standard Linux:

Embedded Linux support from board vendors

I have recently completed a project where I used a PC104 SBC (single-board computer) from a hardware vendor that sold our client a Linux development kit in addition to the hardware; the development kit included a busybox-based distribution with quite an old kernel (2.6.21) and a driver supporting some board-specific features. So far, so good. Very often the time required to build a distribution from scratch is more valuable than the price to pay for a commercial Linux distribution, so this approach is often the sensible one.

However, when we started developing the system, we realized that the provided driver included most of its functionality in a binary blob for which the vendor would not provide any source code or much support in the form of updates. We wanted to use a more recent kernel as 2.6.21 was lacking support for a GPS module we had attached to the board. According to the vendor, getting a binary driver compiled for a different kernel version was out of the question, as they did not have the expertise in-house. They even hinted at lacking some of the driver source, as their latest Linux expert (i.e. their only Linux developer) left the company years ago and they did not bother to set up everything in place to be able to build their driver before the Linux guy left. It is amazing that companies that build excellent hardware may exhibit such a slack attitude towards supporting their product with the most popular embedded OS today; it is even more amazing that they charge for such a development kit!

When starting to work on new hardware, we are confronted with the different options available for getting Linux on our boards; these are the main ones:

  • Roll your own distribution. There are a number of open source projects that can be leveraged to build your own distribution, like crosstool-ng, OpenEmbedded or Buildroot. These projects build a distribution based on your hardware requirements. However, this is not an easy task and you may find issues that require a lot of work from your part, depending on how the board and peripheral you use departs from a standard working configuration used by the distribution.
  • Use an open source distribution like Angström. These distributions are aimed at a specific set of boards. They are a good choice if your hardware is similar to some of the hardware supported by the distribution, otherwise they might require significant porting work.
  • Buy a Linux distribution from an independent vendor or consultant. There are different degrees of customization you can expect from vendors; some examples are Montavista, Wind River, or consultancies like DENX or Free Electrons.
  • Get Linux from the SBC vendor. This may be the shortest route to having a properly configured Linux distribution on your system; although you should pay attention at how committed the vendor is at supporting the issues you may find.

Whatever the route you chose, planning ahead and taking into account the different options is essential to the success of your project.