Linker Scripts – GNU

Keywords: Embedded systems, Build Process, Toolchain, Compiler, Object Files


In the previous tutorial we mentioned that the output of Linker is relocatable object file. Relocatable object files are not executable as no real addresses are assigned to symbols. All symbols in Relocatable image are resolved relative to some fixed address or begging of the respective Section. Run time addresses need to be assigned to Relocatable Object file in order for it to be executable. We mentioned in the previous tutorial that its the job of Locator to assign run time load/run-time address to the relocatable object file. Most of the time Locator is an integrated part of Linker and is not identified a separate entity. In this tutorial we will refer to Locator as a part of Linker or Simply Linker.

 

In case of Desktop computer, applications/programs run under an Operating System (OS) where the virtual addresses (discussed later in the tutorial) change at run time due to relative complex concept of Paging and Demand Paging. Due to the concept of Virtual Memory, the toolchains assign addresses based on fixed pre-defined virtual memory space which is presented to the Applications by the underlying Operating System. Let’s keep these concepts aside as the focus of this tutorial is on Embedded Systems.


Embedded Systems are relatively complex due to the fact that each Processor architecture is different with different Memory Map/Address Space. And the toolchain can’t make assumptions about where the ROM/RAM memories lies in the target Hardware. This is where the toolchain needs Human assistance to tell the toolchain what target hardware memories look like.


Question: Why Linker needs to know about where memories like RAM/ROM lies in target Hardware?


As a quick reminder from the previous tutorial, Object Files contains various section. The sections allowed depends on the Object File format. The most common sections and ones shared by Object Files are given bellow.

Figure-1: Object file Sections/Segments

Section Name Description
.text This section contains the executable instruction codes and is shared among every process running the same binary.
.bss BSS stands for ‘Block Started by Symbol’. It holds un-initialized global and static variables.
.data Contains the initialized global and static variables and their values.
Symbol Table A symbol is basically a name and an address. Symbol table holds information needed to locate and relocate a program’s symbolic definitions and references. A symbol table index is a subscript into this array. Index 0 both designates the first entry in the table and serves as the undefined symbol index. The symbol table contains an array of symbol entries.

Table-1: Object file Sections


The code normally contains both instructions (.text section) to be executed by Processor and data (variables etc.) to be processed. Normally instruction are executed from permanent memory like EEPROM/Flash while data is placed in volatile memory like RAM as it changes during program execution. The two type of memories i.e. (ROM, RAM) don’t overlap in Processor MEMORY MAP. The toolchain needs to know about the addresses ranges of these memories to place/assign addresses to respective sections in appropriate memories.


The information about each section is stored in object file header which is used by OS Loader to copy the desired section (well as a matter of fact, OS loads segments of fixed size pages rather than sections) into desired memory location.


The following code snippet shows a simple Hello World C program, when compiled with GCC C/C++ toolchain produces the Object file hello with addresses assigned to various sections shown in Figure-2.

/*
    @file name: hello.c
*/
#include <stdio.h>
int main() {

    printf("Hello World!");
    return 0;
}


Compiling the above ‘C’ file with GCC toolchain.


gcc -o hello hello.c
objdump -h hello

Figure-2: Hello World Program Sections

Question: How to tell Linker about TARGET Memory Map / Address space???


Well, We have to tell Linker (in appropriate Language it understands) where each section will go. Linker has absolutely no information about target hardware memory map. Each toolchain has its own dialect that it understands for target memory description. In ARM Keil MDK, its called scatter files, in GCC C/C++ compiler its called Linker Script. This tutorial focused on GNU Linker Script. In other toolchain the syntex may vary but the important thing is to grab the base concepts.


Wait…! In above Hello World example we didn’t tell Linker anything where each section will go yet it has placed all the sections at appropriate place. How’s that even possible???

Well for now let’s put it this way; the address/memory map of well know architectures like x86, x64 is already placed/by default provided to Linker unless otherwise mentioned specifically. It’s too early to go inside CISC architecture Virtual Memory and MMUs.

Question: Where Linker script comes into Action?

The Linker script is active at Linker Pass of compilation process i.e. the time when all the input object file(s) (generated from each .c file in the project) is combined into single Output executable Object file figure-3.

Figure-3: Linker script action point

1. Load Address vs Link Address:

There are two types of addresses associated with code compilation and program execution. The Load Addresses are the ones that assigned to each part of program including instructions and variables where the program will LIVE i.e. The places in processor address space where the program will be LOADED. Let’s consider an example of variables. Each variable is assigned a unique address in data segment. Variables are placed in RAM as the values may change at RUN time. Now the problem is, RAM memory is volatile which means as soon as the power is shut down, all the contents in RAM will be lost. This means initially, variables need to be stored at some permanent storage (like ROM/flash) and on program start (before calling the main function), all these variables need to be loaded into RAM. This implies that variables LIVE in the ROM which are Loaded during ROM/flash programming/burning. The addresses where the program is burn/loaded is called Load Addresses while the locations from where the program like variables will be referred to at RUN time are called LINK Addresses as these are the addresses where program internals are linked/referenced. The link memory addresses are called Virtual Memory Address and abbreviated as (VMA) while the Load memory address is abbreviated as LMA.

I think that’s enough introduction to Linker Scripts. Let’s introduce some common Linker script directives used for writing your own Linker script.

2. Linker Scripts Commands/Directives:

Linker scripts are plain text files that include a bunch of commands and directives used to describe target memory space and the way addresses need to be assigned to various sections in output object file. Following are the few most important ones used commonly in Embedded Systems.

2.1 Comments: Linker Scripts allows ‘C’ style comments to increase readability and understandability.

2.2 Location Counter (.) : The Linker maintain an internal Location Counter which is incremented/updated each time an output section is emitted to memory. The Dot (.) symbol is a reference to the Linker internal Location Counter. The ‘.’ symbol is defined by default and contains the current value of Location Counter. The ‘.’ symbol can be assigned any value and when read; returns the current value of Location Counter. The initial value is always 0. The good news is any mathematical operation can be performed on ‘.’ symbol like an integer variable. When a value is assigned to the ‘.’ symbol, the location counter moves to the written value location and emit the subsequent sections to that location. Consider a hypothetical Processor with SRAM starting at 0x20000000 and Flash memory starting at 0x80000000. As we know the .data and .bss sections need to be placed in RAM while the .text (instructions) needs to be placed in Flash memory. The Linker script in this case will be look similar to:

SECTIONS
{
    . = 0x20000000;  /* This will move the Location counter cursor to RAM Memory Location*/
    .data : { *(.data) }
    .bss : { *(.bss) }
    . = 0x80000000;  /* This will move the Location counter cursor to Flash Memory Location*/
    .text : { *(.text) }
}

The tricky part about the ‘.’ Symbol is that its behavior based on where it is accessed. There are two potential places where the ‘.’ Symbol can be used.

  1. Outside Output Section body
  2. Inside Output Section body

2.2.1 ‘.’ Outside Output Section body: When the ‘.’ Symbol is assigned value outside Output sections body, the Location Counter is moved straight to the location as assigned to ‘.’ Symbol. As an example, in the above Linker script the initial value of ‘.’ Symbol is 0 Figure-4(a), then it is assigned the value of 0x20000000; so the Location counter will jump straight to location 0x20000000 to emit .data and .bss sections Figure-4(b). Similarly when the it is assigned 0x80000000 the location counter will jump to this location and emit .text section Figure-4(c).

Figure-4: ‘.’ Symbol for absolute addressing

2.2.2 ‘.’Inside Output Section body: When the ‘.’ Symbol is assigned value inside Output sections body, the Location Counter is not moved to the absolute location assigned to it rather it adds a padding of the assigned value to the parent section starting address and move the cursor to that location. Consider the following linker script.

SECTIONS
{
    . = 0x20000000  /* This will move the Location counter cursor to RAM Memory Location*/
    .data : { 
              *(.data)
              . = 0x400;
            }
    .bss : { *(.bss) }
    . = 0x80000000  /* This will move the Location counter cursor to Flash Memory Location*/
    .text : { *(.text) }
}

In the above Linker script, the data section is placed at location 0x20000000 then after placing .data section; the 0x400 padding is added from the start of parent section (.data in this case) i.e. 0x20000000 + 0x400 = 0x20000400 and the .bss section will be placed at 0x20000400 rather than assigning 0x400 to the Location counter (as you would have expected from . = 0x400; command).

2.3 SECTIONS: The SECTIONS command is used to describe OUTPUT object file layout. SECTION command allows how to organize output sections in object file. This command must be followed by curly braces {} containing a bunch of sub-commands. The SECTIONS commands has the following two main purposes:

  1. Selections (e.g. .data, .text etc.) from Input File(s) to be included in Output Object file.
  2. Where to emit the OUTPUT sections in memory.
  3. Defining OUTPUT sections order.

A very simple Linker script is given bellow.

SECTIONS
{
    .data : { *(.data) }
    .bss : { *(.bss) }
    .text : { *(.text) }
}

The above Linker file tells three things to the Linker:

  1. The Output file will only contain 3-sections i.e. .data, .bss, .text. If any input Object file contains a section other then these three, it will be dropped (–> Output section selection).
  2. The Output Object file should contain .data section first, then .bss section and then .text section (–> Defining LAYOUT/order of output sections in Output File).
  3. ‘*’ indicates, all three sections from ALL Input File(s) need to be merged in Output Object file. No section (.data, .bss, .text) from any Input file can be omitted.

The general form of SECTIONS command is as follow [1].

SECTIONS
{
    .data : { *(.data) }
    .bss : { *(.bss) }
    .text : { *(.text) }
}

The sections-sub-command may or may not be followed by curly braces and can be/contains one of the following command.

  • An ENTRY command
  • A Symbol Assignment
  • Output Section description

2.3.1 ENTRY Command: The ENTRY Command is used to inform the Program Loader what is the Entry Point of the program. The entry point is usually the Startup/run-time library code which settles the run-time environment before the main function is called. The ENTRY command is useful when the program has be run by OS or loaded by a debugger. In bare-metal Embedded system, this command is almost useless and the entry point is hard codded to the Reset Vector in startup file.

Remember: The ENTRY command can appear anywhere in the linker file. Its not mandatory for it to be sections-sub-command or inside the sections-sub-command body.

The most usual common Entry points based on run-time libraries are start (Desktop Applications Toolchains), __c_init0 (TI toolchains), __start (Keil ARM toolchain). The ENTRY command can any of the following form.

ENTRY(SYMBOL)

-e SYMBOL

2.3.2 Symbol Assignment: Symbols appearing in Linker script follows name = value; syntax and have global scope by default i.e. Symbols defined in the Linker Script can be accessed from anywhere in the other files as an extern constant. Let’s say we want to find out where the .text section starts and ends. Follow Symbols assignment can be used to find starting and ending points of .text section.

SECTIONS
{
    .data : { *(.data) }
    .bss : { *(.bss) }

    _text_start = .;
    .text : { *(.text) }
    _text_end = .;

}

As can be seen from the above linker script, before the .text section is emitted to the memory, the value of Location Counter will be assigned to the _text_start Symbol and at the end of .text section, the value of Location Counter will be assigned to the _text_end Symbol. The difference between two will give the .text section size.

If you want to access a Linker Script Symbol in let say .c file, following method can be used.

/*
  @file: test.c 
*/
extern int _text_start;

printf ("Starting Point of .text section is: %d", &amp;_text_start ); 

2.3.3 Output Section description: The output section description is relatively complex but for most of the applications/programs can be reduced to the following form.

SECTIONS
{
    OUTPUT_SECTION_NAME VMA : { 
       
        output-section-command;	   

    } > LMA
}

Let’s explain it with the following simple example.

SECTIONS
{
    .data RAM : { 
       *(.data)
    } > ROM
}

The above script instructs Linker that the output section (.data) will be composed of all the input .data section of input files and initially store the .data section values in ROM memory (LMA) but link it to the RAM (VMA) as at run time .data section will be loaded from ROM memory into RAM memory. The RAM and ROM symbols can be the starting addresses of RAM memory and ROM memory locations or MEMORY REGIONS described bellow.

3. Named Memory Regions:

Though we can describe the whole address space and output object file using SECTIONS commands, there is a downside to it. consider the following simple linker script for a hypothetical address space with RAM memory starting at 0x20000000 and size of 1KB.

SECTIONS
{
    . = 0x20000000
    .data : { *(.data) } 
}

The .data section will be placed in the RAM region. There is absolutely no problem unless and until the output .data section size doesn’t exceed the size of RAM i.e. 1KB in this case. The problem rises when the size exceeds RAM capacity. In this case, the linker has no idea whether it has crossed the RAM limit or not. The Linker will simple bring the Location Counter to 0x20000000 and start emitting .data section to it. At Load/rum time, when the .data section will be loaded into RAM… some data will be lost and may crash the program at run time.

In order to solve this issue, another very useful command called MEMORY command is used. The purpose of the MEMORY Command is to tell Linker the boundaries of Memory to which an output section to be emitted. If and output section exceeds the Memory region size, the linker will throw an error. The following is the general form of Memory Command.

MEMORY
{
	MEM_NAME: ORIGIN = ADDR, LENGTH = ADDR
}

The Memory command for the hypothetical address space described earlier will be like:

MEMORY
{
    RAM: ORIGIN = 0x20000000, LENGTH = 1k
    ROM: ORIGIN = 0x80000000, LENGTH = 4k
}

And the complete linker script will be like:

MEMORY
{
    RAM: ORIGIN = 0x20000000, LENGTH = 1k
    ROM: ORIGIN = 0x80000000, LENGTH = 4k
}

SECTIONS {
    .data RAM : {
                   *(.data)
                } > ROM

    .bss RAM : {
                   *(.bss)
                } > ROM

    .text : {
              *(.data)
            } > ROM
}

Now in this case if any of the output section size exceeded the limit as defined by MEMORY directive, the linker will throw an error.

References:

[1] – Embedded System Design on Shoestring by Lewin A.R.W. Edwards

[2] – Programming Embedded Systems in C and C++ by Michael Barr

[3] – Linker and Loader by John R. Levine



17 thoughts on “Linker Scripts – GNU”

Leave a Reply

Your email address will not be published. Required fields are marked *