Keywords: Embedded systems, ARM, STM32F4, UART, ITM, printf Code Link: Source Code Github Link In the previous tutorial we described how to redirect printf messages to UART. The problem with redirecting printf messages to UART is that it requires a dedicated UART apart from extra […]
Keywords: Embedded systems, ARM, STM32F4, UART, ITM, printf Code Link: Source Code Github Link The “C” printf is one of the most commonly used debug function/assistant used in DESKTOP applications. In embedded systems, the printf is not used frequently due to: Larger Footprint High CPU cycles consumptions which indeed […]
“A debugger is a computer program which may or may not be assisted by some hardware and is used to test and debug other Programs.”
In the previous tutorial we presented a generalized overview of how debuggers works in Embedded Systems. While that tutorial was meant to be general purpose this tutorial is specific to ARM Cortex-M. It is recommended to go through that tutorial as to have a better understanding of Embedded System debugging. In this tutorial we will go one step down into the internals of ARM Cortex-M on-chip debug hardware.
The debug and trace features of the Cortex-M processors are designed based on the ARM Debug Technology called CoreSight. This architecture covers a wide area, including debug interface protocols, on-chip bus for debug accesses, control of debug components, security features, trace data interface, and much more. Not all features of CoreSight are available on all ARM Cortex-M family of processor. Some of the features are compulsory while others are left to silicon vendors as optional features. Of all ARM Cortex-M family, Cortex-M4 and M7 provides almost complete set of CoreSight Debug features.
1. Debug Components
The debug hardware on ARM Cortex-M can be classified into the following major parts – Figure-1.
- The Debug Interface
- Intrusive Debug Hardware – (Debug)
- non-intrusive Debug Hardware – (Trace)
Let’s discuss them one by one.
1.1 The Debug Interface
The whole mystery of Embedded Systems debugging starts with Interface that actually connects target Hardware to your host PC. It like entrance door of the party going inside the SoC debug Hardware. The on-chip debug hardware sends various debug related information to host PC via this interface. The information are later on collected by a program/IDE (called debugger front-end) to represent the information into human readable format.
1.1.1 The Debug Interface Connector
The ARM Cortex-M family of processors debug technology supports two types of interface connections i.e. either SWD or JTAG. The availability of one or both depends both on the specific ARM Cortex-M and silicon vendor. The IEEE 1149.1 called JTAG is a serial protocol initially developed for verifying inter-integrated circuits connectivity (called Boundary Scan) on highly dense PCBs. The JTAG serial protocol has been extended by adding facility to interact with debug hardware on many MCUs as this was the only interface available to embedded ICs by that time.
The only problem with JTAG is that it at least requires 4-pins i.e. TMS, TCLK, TDO, TDI. For some MCUs like 28-pins package, dedicating 4-pins (plus other optional pins ranging from 10 to 20) are too many. To tackle this issue, ARM developed its own debug protocol for debugging and tracing. The protocol is called Serial Wire Debug (SWD) which requires only two pins i.e. SWCLK, SDIO. A third pin called SWO is an additional pin used to emit trace information like printf etc. and is added to SWD. Collectively they are called Serial Wire Viewer (SWV).
To keep legacy support and to cover CoreSight debug enrich features, two wires are normally in-sufficient. So ARM kept options for both i.e. SWD and JTAG in CoreSight and left the implementation to the silicon vendors. So for basic debug features like halting, single stepping, breakpoints, watchpoints, and Software event tracing; SWD is more suitable. In order to use more advance features like instruction tracing etc. JTAG interface is used which has enough pins to be used for extra functionalities.
Some silicon vendors keep support for both implementation in their SoCs. In this case the pins of Serial Wire Debug (SWD) and JTAG are multiplexed. i.e. TCK with SWCLK and TMS with SDIO. Only one is active a time… which one??? The one that is selected by applying special bit patterns on TMS/SWDIO pin. The Following figure demonstrate the interface connection.
There are various vendors implemented protocols on top of SWD which accepts custom protocol on one side and emit SWD signals on other side. example includes TI ICDI, ST-Link etc. etc.
1.1.2 Inner View of Debug Interface
Now that we have explained the external connection to SoC debug hardware, Let’s zoom into the intermediate debug interface unit that connects on-chip debug and trace hardware to the external physical connector explained earlier.
The purpose of debug interface is to convert the outside physical connection protocol i.e. SWD/JTAG to a third protocol understandable for the on-chip debug logic because the connection on the left side (SWD/JTAG) of Debug Interface and right side (debug HW) don’t speak the same language. This is done for portability reasons i.e. to convert Standard interfaces signals to most common peripheral understandable signals.
The Debug Interface acts like a translator and consists of two stages. The First stage that actually accept/speak outside physical connection protocol (SWD/JTAG) is called SWJ-DP: (SW → Serial Wire, J → JTAG, DP → Debug Port). The SWJ-DP port has the capability to switch to any one of both JTAG protocol and SWD protocol. The output of this unit is a generic bus protocol for the internal debug bus (connected on the other side of SWJ-DP unit) for accessing vaious Access Ports (explained shortly).
Now that we have successful connection to external interface (SWD/JTAG) the next step is to connect it to various on-chip slave devices for debugging. The External interface may want to speak to any on-chip peripheral like debug hardware, trace hardware etc. etc. Each of these hardware has a port interfaced on debug bus, having an address, and implements one of the standard CoreSight Access Port given in the Table-1. These ports are called Access Ports (AP). The SWJ-DP can access (via address) up to 256 Access ports i.e Can debug 265 devices. Combine both the SWJ-DP and AP is called Debug Access Port-DAP (Debug Ports + Access Ports).
|AXI-AP||The AXI-AP implements the ADIv5 Memory Access Port (MEM-AP) architecture to directly connect to an AXI memory system. You can connect it to other memory systems using a suitable bridging component.|
|AHB-AP||The AHB-AP provides an AHB-Lite master for access to a system AHB bus. This is compliant with the MEM-AP in ADIv5.1 and can perform 8 to 32-bit accesses.|
|APB-AP||The APB-AP provides an APB master in AMBA® v3.0 for access to the Debug APB bus. This is compliant with the MEM-AP architecture with a fixed transfer size of 32 bits.|
|JTAG-AP||The JTAG-AP provides JTAG access to on-chip components, operating as a JTAG master port to drive JTAG chains throughout the ASIC. This is an implementation of the JTAG-AP in ADIv5.1.|
Table-1: ARM CoreSight Access Ports 
Where are we until now??
We took connection from host PC via SWD/JTAG, connected it to SWJ-DP Port which subsequently connected us to one of the on-chip resources via an Access Port. Now the rest of the debug functionalities depends on the Unit (debug unit) we are connected to!!! Here we will explain the two units we mentioned earlier i.e. the Debug hardware and the Trace Hardware.
1.2 Intrusive Debug Hardware:
Let’s move towards next CoreSight debug component i.e. Intrusive Debug Hardware. There is not much fancy here. The hardware is called intrusive because this piece of debug hardware requires processor to be stopped in order to analyze, debug code or see processor registers etc.
This is the part of CoreSight debug architecture that provides core debug functionalities like halting, resuming, single-stepping etc. This part of CoreSight provides support for both Halt mode and debug monitor exception mode debugging.
In Halt mode the processor stops executing code like hitting a breakpoint, Systick timer is stopped, and interrupts my be pended or totally ignored.
The problem with halt mode is that if the processor is controlling some machinery like Motor or Engine etc. that may requires start/stop sequence. With halting mode the processor stops executing instruction instantly. With this abrupt halting of Motor/Engine controlling sequence/code may damage the Motor/Engine machinery.
To solve this problem the on-chip debug hardware of ARM Cortex-M3/M4 provides the facility of Debug Monitor Exception. The Debug Monitor Exception is one of the core Processor exception/interrupt (Exception no 12) with configurable priority. In this state upon hitting a debug event like breakpoint etc. the processor doesn’t stop executing instructions rather the processor executes a Debug Monitor ISR. The processor serve the ISR and returns back to normal program execution. In ARM Cortex-M4 the Debug monitor exception no. is 12 and its priority is configurable – Figure-6. Thus more critical jobs can be kept in high priority ISRs while debug exception can be kept at low priority ISRs.
Note: The Debug Monitor exception is only available on ARM Cortex-M3 and M4 processors. It is not available on Cortex-M0/M0+ cores.
The intrusive part of debug HW is supported by two debug Units called Flash patch and breakpoint (FPB) Unit and Data watchpoint and Trace (DWT) Unit as explained bellow.
1.2.1 Flash patch and breakpoint (FPB) Unit
The Flash patch and breakpoint (FPB) Unit contains a set of Hardware comparators and provides the hardware breakpoint feature.
1.2.2 Data watchpoint and trace (DWT) unit
The Data watchpoint and trace (DWT) unit is similar to FPB unit on hardware level except it looks for a match on data bus i.e. providing the data watchpoints feature. The other features of DWT are PC Sampling on regular intervals, and CPU clock cycles counting. The CPU clock cycle count feature provides the facility of measuring best and worst case performance calculation of a piece of code like an algorithm. This unit also supports Debug Trace hardware (coming next).
Note: For more details on Hardware/Software breakpoints and data watch points, refer to the following tutorial.
1.3 Non-intrusive Debug Hardware – (Trace)
Let’s move to the final and most amazing part of the ARM cortex-M debug hardware. The beauty of this unit is that it provides live debugging/tracing i.e. viewing back instructions history like how processor reached a certain point without stopping the processor. The bad news is its not available on all Cortex-M family. We won’t go into much inner details of this feature.
The Trace Hardware mainly consists of the follow two major parts.
- Instrumentation Trace Macrocell (ITM)
- Embedded trace macrocell (ETM)
The output of above two units make their way to the external world (DAP) via the Trace Port interface Unit called TPIU as shown in the Figure-8 .
1.3.1 Instrumentation Trace Macrocell (ITM)
The Instrumentation Trace Macrocell (ITM) is basically a data logger that takes input from DWT and User Application, packectize it, puts a time stamp with it and send it over the SWD output Line. ITM is not fully non-intrusive as user has to write its register with some data that needs to be transmitted. On Hardware level, ITM consists of 32 channels (subsequently 32-data registers), a time-stamp unit and a very small buffer of 10-bytes – Figure-9. Any type of data can be written to any of the 32-channel/register and ITM will put a time-stamp with it and transmit it via SWO Line. Although data can be written to any of Port’s/Channel’s Register, ARM recommends Channel-0 to be used for text data i.e.
printf like output and channel-31 for RTOS Events.
The only problem with ITM is that as its buffer is very small i.e. only 10-bytes and shared by all channels so it can easily get overload. The data transmitted via various channels is collected by Host PC IDE/Front End Program and display each in appropriate window. We have a dedicated tutorial for redirecting
printf output to ITM Channel-0 on STM32F4-Discovery board. The Link to the tutorial is given bellow.
1.3.2 Embedded trace macrocell (ETM)
The ITM can provide very useful functionalities like live variable change values, PC Sampling, Cycles Count (all provided by DWT), and user software debug
printf like text-data, it can’t provide live instruction trace. In order to have full real-time instruction trace another unit called Embedded trace macrocell (ETM) is provided on ARM Cortex-M3/M4 cores. Though this unit is optional and its implementation is left to the silicon vendor to implement it or leave it as per their requirements and target market.
The Embedded trace macrocell (ETM) provides full real-time instruction trace over 4-bits high speed trace bus. Using this feature, the ETM will let you know how your application got a certain point say interrupt. You can trace back the whole path followed (all previous instruction executed) by the Core before it reached your desired point. This is extremely handy feature with downside that its not available on all ARM Cortex-M family (only M3/M4); also the ETM is not supported over SWD and external JTAG interface hardware is required to trace instruction which are normally a bit expensive.
Note: Both the ITM and ETM are only available on ARM Cortex-M3/M4 cores. ETM is also optional and may or may not be implemented. Check your vendors SoC datasheet for ETM implementation.
1.3.3 Micro Trace Buffer (MTB)
While the ETM is available on M3/M4 cores, there is no instruction trace available on Cortex-M0. However Cortex-M0+ has a way around it called Micro Trace Buffer (MTB). This is not really a real-physical hardware or a part of Trace debug hardware. The MTB uses a small region of internal SRAM acting like circular buffer. While the application is running, a trace of executed instructions is recorded in MTB buffer. When the processor is halted for debugging, the debugger can read the buffer and display the previous executed instructions. Due to the limited SRAM buffer available that’s why only instant/most recent instructions can be traced back i.e. the processor needs to be halted frequently to trace the previously executed instructions.
 – DAP components