A primer about linking
ARM assembler in Raspberry Pi – Chapter 26
In this chapter we will talk about a fascinating step that is required to create a program, even when using assembler. Today we will talk about linking.
Linkers, the magic between symbols and addresses
Linkers are an essential yet often forgotten tool. Their main job is sticking all the pieces that form our program in a way that it can be executed. The fundamental work of a link is binding symbolic names with addresses (i.e. physical names). This process is conceptually simple but it is full of interesting details. Linking is a necessary step when separate compilation is used.
Separate compilation and modules
Modules are a mechanism in which programming languages let their users split programs in different logical parts. Modularization requires some amount of support from the tools that implement the programming language. Separate compilation is a mechanism to achieve this. In C, a program may be decomposed in several source files. Usually compiling a C source file generates an object file, thus several source files will lead to several object files. These object files are combined using a linker. The linker generates the final program.
ELF
Given that several tools manipulate object files (compilers, assemblers, linkers) a common format comes handy. There are a few formats available for this purpose like COFF, Mach-O or ELF. In the UNIX world (including Linux) the most popular format is ELF (Executable and Linking Format). This format is used for object files (called relocatable objects, we will see below why), shared objects (dynamic libraries) and executables (the program itself).
For a linker, an ELF relocatable file is a collection of sections. Sections represent a contiguous chunk of data (which can be anything: instructions, initial values of global variables, debug information, etc). Each section has a name and attributes like whether it has to be allocated in memory, loaded from the image (i.e. the file that contains the program), whether it can be executed, whether it is writable, its size and alignment, etc.
Labels as symbolic names
When we use global variables we have to use the following schema:
1 2 3 4 5 6 7 8 9 | .data: var: .word 42 .text func: /* ... */ ldr r0, addr_of_var /* r0 ← &var */ ldr r0, [r0] /* r0 ← *r0 */ /* ... */ addr_of_var : .word var |
The reason is that in ARM instructions we cannot encode the full 32-bit address of a variable inside an instruction. So it makes sense to keep the address in a place, in this case in addr_of_var
, which is amenable for finding it from the current instruction. In the case shown above, the assembler replaces the usage of addr_of_var
into something like this:
6 | ldr r0, [pc, #offset] |
Which means load the value found in the given offset of the current instruction. The assembler computes the right offset here so we do not have to. This is a valid approach because addr_of_var
is found in the same section as the instruction. This means that it will for sure be located after the instructions. It also happens that it is close enough in memory. This addressing mode can encode any offset of 12-bit (plus a sign bit) so anything within 4096 bytes (i.e. within 1024 instructions) is addressable this way.
But the question that remains is, what does the assembler put in the that location designated by addr_of_var
? We have written .word var
but what does this mean? The assembler should emit the address of var
, but at this point its address is unknown. So the assembler can only emit partial information at this point. This information will be completed later.
An example
Let’s consider a more complex example to see this process in action. Consider the following code that takes two global variables and adds them into a result variable. Then we call a function, that we will write in another file. This function will increment the result variable by one. The result variable has to be accessible from the other file, so we will have to mark it as global (similar to what we do with main
).
/* main.s */ .data one_var : .word 42 another_var : .word 66 .globl result_var /* mark result_var as global */ result_var : .word 0 .text .globl main main: ldr r0, addr_one_var /* r0 ← &one_var */ ldr r0, [r0] /* r0 ← *r0 */ ldr r1, addr_another_var /* r1 ← &another_var */ ldr r1, [r1] /* r1 ← *r1 */ add r0, r0, r1 /* r0 ← r0 + r1 */ ldr r1, addr_result /* r1 ← &result */ str r0, [r1] /* *r1 ← r0 */ bl inc_result /* call to inc_result */ mov r0, #0 /* r0 ← 0 */ bx lr /* return */ addr_one_var : .word one_var addr_another_var : .word another_var addr_result : .word result_var |
Let’s create an object file. Recall that an object file is an intermediate file that is used before we create the final program. Once created, we can use objdump -d
to see the code contained in this object file. (The use of -march=armv6
avoids some legacy info be emitted that would be confusing for the sake of the exposition)
$ as -march=armv6 -o main.o main.s # creates object file main.o |
Relocations
We said above that the assembler does not know the final value and instead may put some partial information (e.g. the offsets from .data
). It also annotates that some fix up is required here. This fix up is called a relocation
. We can read the relocations using flags -dr
of objdump
.
$ objdump -dr main.o |
main.o: file format elf32-littlearm Disassembly of section .text: 00000000 <main>: 0: e59f0020 ldr r0, [pc, #32] ; 28 <addr_one_var> 4: e5900000 ldr r0, [r0] 8: e59f101c ldr r1, [pc, #28] ; 2c <addr_another_var> c: e5911000 ldr r1, [r1] 10: e0800001 add r0, r0, r1 14: e59f1014 ldr r1, [pc, #20] ; 30 <addr_result> 18: e5810000 str r0, [r1] 1c: ebfffffe bl 0 <inc_result> 1c: R_ARM_CALL inc_result 20: e3a00000 mov r0, #0 24: e12fff1e bx lr 00000028 <addr_one_var>: 28: 00000000 .word 0x00000000 28: R_ARM_ABS32 .data 0000002c <addr_another_var>: 2c: 00000004 .word 0x00000004 2c: R_ARM_ABS32 .data 00000030 <addr_result>: 30: 00000000 .word 0x00000000 30: R_ARM_ABS32 result_var |
Relocations are rendered the output above like
OFFSET: TYPE VALUE |
They are also printed right after the point they affect.
OFFSET
is the offset inside the section for the bytes that will need fixing up (in this case all of them inside .text
). TYPE
is the kind of relocation. The kind of relocation determines which and how bytes are fixed up. VALUE
is a symbolic entity for which we have to figure the physical address. It can be a real symbol, like inc_result
and result_var
, or a section name like .data
.
In the current list, there is a relocation at .text+1c
so we can call the actual inc_result
. The other two relocations in .text+28
, .text+2c
are the relocations required to access .data
. These relocations could have as VALUE
the symbols one_var
and another_var
respectively but GNU as seems to prefer to represent them as offsets relative to .data
section. Finally .text+30
refers to the global symbol result_var
.
Every relocation kind is defined in terms of a few parameters: S
is the address of the symbol referred by the relocation (the VALUE
above), P
is the address of the place (the OFFSET
plus the address of the section itself), A
(for addenda) is the value that the assembler has left in place. In our example, R_ARM_ABS32
it is the value of the .word
, for R_ARM_CALL
it is a set of bits in the bl
instruction itself. Using these parameters, earch relocation has a related operation. Relocations of kind R_ARM_ABS32
do an operation S + A
. Relocations of kind R_ARM_CALL
do an operation (S + A) – P
.
T
that has the value 1
if the symbol S
is a Thumb function, 0
otherwise. This is not the case for our examples, so I have omitted T
in the description of the relocations aboveBefore we can see the result computed by the linker, we will define inc_result
otherwise linking will fail. This function will increment the value of addr_result
(whose storage is defined in the first file main.s
).
/* inc_result.s */ .text .globl inc_result inc_result: ldr r1, addr_result /* r1 ← &result */ ldr r0, [r1] /* r0 ← *r1 */ add r0, r0, #1 /* r0 ← r0 + 1 */ str r0, [r1] /* *r1 ← r0 */ bx lr /* return */ addr_result : .word result_var |
Let’s check the relocations as well.
$ as -march=armv6 -o inc_result.o inc_result.s $ objdump -dr inc_result.o |
inc_result.o: file format elf32-littlearm Disassembly of section .text: 00000000 <inc_result>: 0: e59f100c ldr r1, [pc, #12] ; 14 <addr_result> 4: e5910000 ldr r0, [r1] 8: e2800001 add r0, r0, #1 c: e5810000 str r0, [r1] 10: e12fff1e bx lr 00000014 <addr_result>: 14: 00000000 .word 0x00000000 14: R_ARM_ABS32 result_var |
We can see that it has a relocation for result_var
as expected.
Now we can combine the two object files to generate an executable binary.
$ gcc -o test.exe print_float.o reloc.o |
And check the contents of the file. Our program will include a few functions from the C library that we can ignore.
$ objdump -d test.exe |
... 00008390 <main>: 8390: e59f0020 ldr r0, [pc, #32] ; 83b8 <addr_one_var> 8394: e5900000 ldr r0, [r0] 8398: e59f101c ldr r1, [pc, #28] ; 83bc <addr_another_var> 839c: e5911000 ldr r1, [r1] 83a0: e0800001 add r0, r0, r1 83a4: e59f1014 ldr r1, [pc, #20] ; 83c0 <addr_result> 83a8: e5810000 str r0, [r1] 83ac: eb000004 bl 83c4 <inc_result> 83b0: e3a00000 mov r0, #0 83b4: e12fff1e bx lr 000083b8 <addr_one_var>: 83b8: 00010578 .word 0x00010578 000083bc <addr_another_var>: 83bc: 0001057c .word 0x0001057c 000083c0 <addr_result>: 83c0: 00010580 .word 0x00010580 000083c4 <inc_result>: 83c4: e59f100c ldr r1, [pc, #12] ; 83d8 <addr_result> 83c8: e5910000 ldr r0, [r1] 83cc: e2800001 add r0, r0, #1 83d0: e5810000 str r0, [r1] 83d4: e12fff1e bx lr 000083d8 <addr_result>: 83d8: 00010580 .word 0x00010580 ... |
From the output above we can observe that addr_one_var
is in address 0x00010578
, addr_another_var
is in address 0x0001057c
and addr_result
is in address 0x00010580
. The last one appears repeated, but this is because both files main.s
and inc_result.s
refer to it so they need to keep the address somewhere. Note that in both cases it contains the same address.
Let’s start with the relocations of addr_one_var
, addr_another_var
and addr_result
. These three relocations were R_ARM_ABS32
so their operation is S + A
. S
is the address of section .data
whose address can be determined also with objdump -h
(plus flag -w
to make it a bit more readable). A file may contain many sections so I will omit the uninteresting ones.
$ objdump -hw test.exe |
test.exe: file format elf32-littlearm Sections: Idx Name Size VMA LMA File off Algn Flags ... 13 .text 0000015c 000082e4 000082e4 000002e4 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE ... 23 .data 00000014 00010570 00010570 00000570 2**2 CONTENTS, ALLOC, LOAD, DATA ... |
Column VMA
defines the address of the section. In our case .data
is located at 00010570
. And our variables are found in 0x00010578
, 0x0001057c and 0x00010580
. These are offsets 8, 12 and 16 respectively from the beginning of .data
. The linker has laid some other variables in this section before ours. We can see this asking the linker to print a map of the generated executable.
$ gcc -o test.exe main.o inc_result.o -Wl,--print-map > map.txt $ cat map.txt |
314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 | .data 0x00010570 0x14 0x00010570 PROVIDE (__data_start, .) *(.data .data.* .gnu.linkonce.d.*) .data 0x00010570 0x4 /usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/crt1.o 0x00010570 data_start 0x00010570 __data_start .data 0x00010574 0x0 /usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/crti.o .data 0x00010574 0x4 /usr/lib/gcc/arm-linux-gnueabihf/4.6/crtbegin.o 0x00010574 __dso_handle .data 0x00010578 0xc main.o 0x00010580 result_var .data 0x00010584 0x0 inc_result.o .data 0x00010584 0x0 /usr/lib/arm-linux-gnueabihf/libc_nonshared.a(elf-init.oS) .data 0x00010584 0x0 /usr/lib/gcc/arm-linux-gnueabihf/4.6/crtend.o .data 0x00010584 0x0 /usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/cr |
If you check lines 317 to 322, you will see that that the final .data
section (that effectively starts 0x00010570
as we checked above) of our program includes 4 bytes from crt1.o
for the symbols data_start
(and its alias __data_start
). File crtbegin.o
also has contributed a symbol __dso_handle
. These global symbols come from the C library. Only symbol result_var
appears here because is a global symbol, all other global variables are not global symbols. The storage, though, is accounted for all of them in line 323. They take 0xc bytes (i.e. 12 bytes because of 3 variables each one of 4 bytes).
So with this info we can infer what has happened: variable one_var
is in address 0x00010570, variable another_var
is in 0x00010574 and variable result_var is in 0x00010578. If you check the result of objdump -d test.exe
above you will see that
000083b8 <addr_one_var>: 83b8: 00010578 .word 0x00010578 000083bc <addr_another_var>: 83bc: 0001057c .word 0x0001057c 000083c0 <addr_result>: 83c0: 00010580 .word 0x00010580 ... 000083d8 <addr_result>: 83d8: 00010580 .word 0x00010580 |
What about the call to inc_result
?
83ac: eb000004 bl 83c4 |
This one is a bit more involved. Recall that the relocation operation is (S + A) - P
. Here A
is 0
and P
is 0x000083ac
, S is 0x000083c4
. So the relocation has to define an offset of 24 bytes (83c4 – 83ac is 24(10). Instruction bl
encodes the offset by shifting it 2 bits to the right. So the current offset encoded in eb000004
is 16. Recall that the current pc
points to the current instruction plus 8 bytes, so this instruction is exactly telling us to jump to an offset + 24 bytes. Exactly what we wanted.
... 83ac: eb000004 bl 83c4 <inc_result> 83b0: e3a00000 mov r0, #0 83b4: e12fff1e bx lr 000083b8 <addr_one_var>: 83b8: 00010578 .word 0x00010578 000083bc <addr_another_var>: 83bc: 0001057c .word 0x0001057c 000083c0 <addr_result>: 83c0: 00010580 .word 0x00010580 000083c4 <inc_result>: 83c4: e59f100c ldr r1, [pc, #12] ; 83d8 <addr_result> ... |
More information
Linkers are a bit of arcana because they must handle with the lowest level parts of code. So sometimes it is hard to find good resources on them.
Ian Lance Taylor, author of gold
, made a very nice linker essay in 20 chapters. If you want a book, Linkers & Loaders is not a bad one. The ELF standard is actually defined in two parts, a generic one and a processor specific one, including one for ARM.
That’s all for today.
Exploring AArch64 assembler – Chapter 4 Exploring AArch64 assembler – Chapter 5
no, please do not put the name of my company in the book. This blog is completely unrelated to it.
Hope you can understand.
Kind regards,
</ul> </li>
</ul> </li>
Why do you use bx lr to return? The cannicle way is to use MOV R15,R14 (or MOV PC,LR if you like the APCS-R register names).
I thought it was clearer for the sake of the exposition to avoid using updates to the
pc
for control flow. For the purposes of the tutorial it would not make much difference to usemov pc, lr
in most places. It would only make a difference if our ARM routine were called from a Thumb routine.Kind regards,
</ul> </li> </ul>
</p>