Real-time software with MicroPython

MicroPython has evolved recently to become a great platform for quick prototyping of microcontroller software. The expressiveness of the Python language and its rich library ecosystem make it extremely useful and allow the testing of embedded software in a matter of minutes.

However, when a real-time solution is needed, care has to be taken as there are inherent limitations imposed by the Python virtual machine. MicroPython tries to work around the lack of memory and the low speed of microcontrollers and is designed with these constrains in mind.

There are ways to maximize the speed at which MicroPython runs. Our tests are simple as they do not involve complex memory allocations, garbage collection or asynchronous programming. We are interested in exhibiting just the limitations of the MicroPython virtual machine and unavoidable architectural issues like cache loading when jumping to code stored in flash. These quick tests provide us with a rough idea of the performance that can be reached when accessing the GPIOs on some of the MicroPython reference boards. These figures are obtained in a system with no additional CPU load, without mounting the mass storage device and trying different code emitters.

GPIO toggling

A line of MicroPython code takes a few μsecs to execute on a PyBoard @168MHz. Toggling a GPIO pin should of course be faster than most instructions. Tests have been performed using the following code:

from pyb import Pin

p_out = Pin('X3', mode=Pin.OUT_PP)
while True:

An oscilloscope can easily show the time it takes to execute the GPIO set/clear functions as well as the time taken by the loop branch. The results are quite stable, with no appreciable jitter:

code emitter loop branch (μs) GPIO set/clear (μs)
bytecode 5.2 2.7
native 3.4 2.3
viper 0.35 2.2

These tests have also been performed on a Nucleo F429ZI board @180MHz where we observe slightly worse values (GPIO set instructions take around 10% longer).

GPIO external interrupts

We measure the time it takes for the PyBoard to set an output GPIO pin in response to an interrupt triggered on an input GPIO. The source of interrupts is a pulse generator that triggers an interrupt every millisecond. The following code has been used to measure the interrupt latency with the help of an oscilloscope:

from pyb import Pin

def gpio_cb(e):

p_out = Pin('X3', mode=Pin.OUT_PP)
ExtInt(Pin('X4'), ExtInt.IRQ_RISING, Pin.PULL_DOWN, gpio_cb)

The interrupt latency results on a PyBoard v1.0 @168MHz are stable and exhibit low jitter, with these results obtained after running for one hour:

code emitter interrupt latency (μs) interrupt jitter (μs)
bytecode 7.3 < 5.0
native 5.7 < 4.0
viper 3.3 < 4.0

The signal was captured with an oscilloscope running in persistent mode. The constant execution time of the GPIO pin toggling instructions gives us confidence that the observed variability is due only to the interrupt jitter.

Tests have also been performed in a Nucleo F429ZI board @180MHz, where we observe a slightly worse performance (GPIO interrupt latency is similar with bytecode but around 20% worse with native and viper emitters). However, the Nucleo board shows a much more stable latency value, with jitter bounded at 2μs.

We have written some code that produces an histogram of the measured latency variability.

Shall we use MicroPython in our real-time projects?

MicroPython is mature, elegant, and offers great productivity advantages over other embedded software environments. MicroPython is quite capable of running real-time code as long as we put some care around its limitations; after all we have to remember that real-time is not about being fast but about being deterministic. Providing the memory/speed resources are sufficient and we can meet our real-time deadlines, it is a great platform which we intend to use more and more in the future.

Industrial Beaglebone Black anyone?

Is it really possible to use the Beaglebone Black in industrial embedded projects? or it is just a maker/hobbyist platform?

I have been searching for an industrial version of the Beaglebone Black in order to leverage the great know-how and resources available on this great open source hardware platform to make it work reliably on industrial environments. A professional version would allow e.g. better platform longevity planning, reliability, customization or industrial temperature ranges. Last year there were some efforts that did not get anywhere.

The Beaglecore project in kickstarter seems like a great attempt at remedying this situation. It is a system on module (SOM) fully compatible with the existing Beaglebone Black, with features aimed at industrial computing and long-term availability. Help them achieve their goals!

Developing a PPS GPIO generator driver on a Beaglebone Black

While working on PTP (Precision Time Protocol, IEEE 1588) with some Beaglebone Black boards, I needed a way of comparing the time on different boards with high resolution (10ns). I saw that there is no currently a PPS generator that uses GPIO pins. There is only an option to use a parallel port.

This was a good excuse to write a kernel driver. I started by modifying the parallel port PPS generator implemented in the kernel to use a GPIO pin instead. You can use this module on other boards providing you modify your device tree file accordingly. This module has been tested with kernel 3.15.3.

The source is at:

ARM unaligned data access and floating point in Linux

I was recently getting Data Aborts on an ARM11 program that makes intensive use on unaligned data accesses. The issue was caused by unaligned floating point accesses, which were not handled by the Linux kernel. Some background on the problem follows.

ARM unaligned data access hardware support

ARM 32-bit instructions must always be word boundary aligned. Data accesses do not have this restriction. Prior to ARMv6 architecture, unaligned load and store memory accesses were treated as aligned by truncating the data address. Starting with ARMv6, unaligned word and halfword load and store data access is supported by issuing one or more memory accesses to read the required bytes transparently, albeit incurring in a potentially greater access time.

Unaligned data access is controlled through the following bits of the CR1 register of the CP15 coprocessor:

  • U bit. Unaligned data access support enabled. This bit must be set in order to enable unaligned data access support. Disabling this bit means we must either provide an unaligned data access handler (like the one provided by the Linux kernel) or our software must be compiled with unaligned data access disabled by using the corresponding compiler option.
  • A bit. Alignment fault enabled. When this bit is set, all unaligned data accesses cause a Data Abort exception, irrespective of the value of the U bit. When A and U bits are not set, legacy ARMv5 mode is enabled, where an unaligned data access is treated as aligned and the data address is truncated.

The default configuration on ARM11 and ARM Cortex-A processors is U=1 and A=0, allowing unaligned half/word data access, otherwise having a strict word alignment check. Note that an unaligned multiple word access (e.g. long long) or coprocessor data access always signal Data Abort with Alignment Fault Status Code, even when the A bit is not set. Doubleword accesses must always be four-byte aligned.

Our current compiler, gcc 4.6.3, produces code with unaligned loads by default, being not possible to disable unaligned access. Other compilers are able to produce code with unaligned data access disabled (e.g. CodeSourcery, with option –mno-unaligned-access).

ARM unaligned data access and the Linux kernel

CONFIG_ALIGNMENT_TRAP is a kernel configuration option that makes non-aligned load/store instructions be emulated in software. Recent Linux kernels enable this setting by default. In fact, it is not even possible to disable this option with menuconfig (in order to make this setting visible with menuconfig, its description needs to be updated in arch/arm/Kconfig). On ARMv6 and later, this configuration option does not affect the initialization value of the CR1 register. This setting affects the software emulation for double word unaligned access while single word accesses are taken care of by the hardware directly (given our default A/U bit settings). If we disable CONFIG_ALIGNMENT_TRAP, double word unaligned accesses result on a bus error and program crash.

In the default case, with CONFIG_ALIGNMENT_TRAP enabled, a double word unaligned access results on an unaligned access fix by the kernel. This behavior is configurable through the /proc/cpu/alignment virtual file (the kernel needs to be compiled with CONFIG_DEBUG_KERNEL in order to make it visible). The default case handling of different types is:

  •  int (32-bit). Unaligned data access is handled directly by the hardware with no kernel involvement (/proc/cpu/alignment is not affected).
  • long long (64-bit). ARM processor cores do not support 64-bit unaligned accesses, so this is handled by the Linux kernel (/proc/cpu/alignment shows a DWord increment). The kernel traps an exception and the access is simulated.
  • float (IEEE single precission, 32-bit). ARM processor cores do not support unaligned accesses to VFP hardware instructions. See below.

Unaligned floating point accesses

gcc produces hardware-enabled floating point software when setting –mfloat-abi to softfp or hard, the difference being that the former generates function calls where FP arguments are passed in integer registers (same as soft ABI). An unaligned hardware floating point access results on an exception that the Linux kernel does not trap, therefore our program segfauls. An example of this kind of access can be shown with the following code:

#include <stdio.h>

int main(int argc, char* argv[])
    char __attribute__ ((aligned(32))) buffer[8] = { 0 };
    float* fval_p;

    fval_p = (float*)&buffer[1];
    *fval_p = 0.1234;

    printf("\nfloat at &buf[1] is %f\n", *fval_p);

    return 0;

This produces a Bus error, with /proc/cpu/alignment showing:

User:            1
Skipped:       1

This means that the kernel was unable to fix the Data Abort exception that took place. This problem can be fixed by compiling our software with floating point emulation (-mfloat-abi=soft), which can be performed by the Linux kernel but is normally more efficiently done by the standard C library. This has the drawback of slower code, which can have a performance impact on software that relies heavily on floating point calculations, like scientific applications or graphics processing software. The definitive solution to this kind of abort and the one we should always aim at involves fixing our software to always access floating point data on 4-byte aligned memory.

Embedded Linux and ARM

Linux usage is growing enormously in embedded systems, thanks to its stability, being open source, the availability of drivers for a huge amount of hardware peripherals and its support for many networking protocols and filesystems. However, Linux exhibits some drawbacks in safety systems, where the code needs to be certified, or hard real-time systems where deadlines are critical.

Nowadays, some Linux installs in embedded systems have been deployed following a top-down approach, where no much care has been taken to remove unused software. This may have security implications, resulting also on code bloating and maintenance problems down the line of a software product lifecycle. I recommend following a bottom-up approach, where we control precisely the software installed in our systems. This helps in the long run with easier maintainability, and better security.

Why is ARM the dominant architecture on embedded systems? ARM follows a fabless model, with licensees competing with each other on SoCs that include an ARM core and a number of extensions. This model, together with the efficiency and elegancy of their design has made them number one, especially in power-conscious designs like mobile phones.

It is becoming very easy to port Linux to new hardware devices on X86, MIPS and ARM platforms. This is a list of popular ARM development platforms with ARM cores containing an MMU and therefore can be leveraged with standard Linux:

Embedded Linux support from board vendors

I have recently completed a project where I used a PC104 SBC (single-board computer) from a hardware vendor that sold our client a Linux development kit in addition to the hardware; the development kit included a busybox-based distribution with quite an old kernel (2.6.21) and a driver supporting some board-specific features. So far, so good. Very often the time required to build a distribution from scratch is more valuable than the price to pay for a commercial Linux distribution, so this approach is often the sensible one.

However, when we started developing the system, we realized that the provided driver included most of its functionality in a binary blob for which the vendor would not provide any source code or much support in the form of updates. We wanted to use a more recent kernel as 2.6.21 was lacking support for a GPS module we had attached to the board. According to the vendor, getting a binary driver compiled for a different kernel version was out of the question, as they did not have the expertise in-house. They even hinted at lacking some of the driver source, as their latest Linux expert (i.e. their only Linux developer) left the company years ago and they did not bother to set up everything in place to be able to build their driver before the Linux guy left. It is amazing that companies that build excellent hardware may exhibit such a slack attitude towards supporting their product with the most popular embedded OS today; it is even more amazing that they charge for such a development kit!

When starting to work on new hardware, we are confronted with the different options available for getting Linux on our boards; these are the main ones:

  • Roll your own distribution. There are a number of open source projects that can be leveraged to build your own distribution, like crosstool-ng, OpenEmbedded or Buildroot. These projects build a distribution based on your hardware requirements. However, this is not an easy task and you may find issues that require a lot of work from your part, depending on how the board and peripheral you use departs from a standard working configuration used by the distribution.
  • Use an open source distribution like Angström. These distributions are aimed at a specific set of boards. They are a good choice if your hardware is similar to some of the hardware supported by the distribution, otherwise they might require significant porting work.
  • Buy a Linux distribution from an independent vendor or consultant. There are different degrees of customization you can expect from vendors; some examples are Montavista, Wind River, or consultancies like DENX or Free Electrons.
  • Get Linux from the SBC vendor. This may be the shortest route to having a properly configured Linux distribution on your system; although you should pay attention at how committed the vendor is at supporting the issues you may find.

Whatever the route you chose, planning ahead and taking into account the different options is essential to the success of your project.

Valladolid Code Retreat, May 2011

I have just attended a Code Retreat event in Valladolid, facilitated by Enrique Comba Riepenhausen. This was an excellent opportunity to meet fellow geeks interested in continuous learning, that were willing to spend a whole Saturday doing pair programming, exchanging ideas about processes and agile practices, and above all, having fun!

Learning is about getting out of your comfort zone, which was definitely my case, as I had to follow practices like TDD or pair programming, and I even paired with guys writing groovy, although the most popular languages were java and python.

Thanks a lot to Enrique, the sponsors and the organizers!

I can only encourage you to look for Code Retreats in your area, or why not organize one yourself? Really worth it!