Paint it black / paint the stack

Published on May 24, 2018
By Terry / Senior System Design Engineer

When developing applications for an embedded system, and with embedded I mean bare-metal on a Cortex-M4 kind of embedded, it is often hard to judge what amount of memory the application is using during runtime. In these systems the Stack and Heap often share a region in memory and when memory is claimed of either type they tend to ‘grow’ towards each other. As long as they do not overlap everything works as intended, but when things do overlap the behaviour is undefined – often resulting in hard crashes of the application. Without the protection of an operating system, there is no safeguard and a real chance the memory will overlap.

A word of warning before using the code ‘as is’: each device will have its memory layout different. You need to know where the stack and heap are located, their size and modify the code to match these numbers. This is usually defined in the linker file.

Usually the size of the Stack (and indirect the remaining Heap memory) can be set via the linker file. Increasing one will reduce the other, so choosing a proper Stack size will ensure a possible overlap is avoided, but how to figure out how much is needed?

A technique called ‘stack painting’ can be of use, here is how it works. After starting the system the first thing done in main(): the entire Stack area is ‘painted’ (filled) with a specific pattern, a value like ‘0xC5C5C5C5’. Then the application continues its normal operation. After some time (as in: all functionality of the application is used at least once), the ‘painted’ area is investigated. Running backwards over the ‘painted’ Stack to find the first spot/address where the pattern is no longer present → this indicates how much Stack is used by the application (at some point in time). Using this address and the starting address of the Stack, the size can be calculated – this gives a good indication on how much the application really needs. A word of warning: it is not fool proof, meaning there are some corner cases a Stack overflow is not detected using this technique.

With the Stack size determined and properly set, we are left with the Heap. There are 2 common pitfalls to watch out for here: memory leaks (claiming memory and not properly releasing it: resulting in an ever growing amount of memory claimed) and memory fragmentation (claiming and releasing memory of various sizes, leaving fragments of memory which cannot be reused/reclaimed). I am assuming as developer you handle memory gracefully and release whatever you claim; meaning there are no memory leaks. What I am more interested in is the fragmentation and how to detect this. One of the tasks of an operating system is to periodically check if there are fragments and move the data in the memory around to fill these fragments. Without an operating system they remain until the end of the application lifetime, meaning there will be less and less memory until the application runs out.

A technique to check if the memory fragments (or leaks) is to request 0 bytes of memory of the low level function called _sbkr(). This then returns the address of the last byte of heap memory. Given that we know the start of the heap, this is an easy means to find what has been used. If we check if this number remains constant over time (or not), we know there is no fragmentation (or leak). Both cases will leap to a crashing application eventually, thus if the location changes, there is some work to do.

Both techniques can help in figuring how much memory your application is using, and if it is leaking/fragmenting or not. They form a practical alternative to much more convenient desktop profilers which are simply not available for embedded use. A downside of these techniques is that they do take some computing time – which means they are intended to be used sparingly. A suggestion would be to log the used stack size to a logfile when the device enters a sleep state. Another one is to check for fragmentation when there is more calculation time available on the device – at set intervals, maybe even raising the clock speed to make the process a lot quicker. An upside of this technique is that it is rather portable and works in Release builds as well.

Code examples (Atmel Cortex-M4)

Further reading,-Part-1:-Heap-Memory-Introduction