Linux中誰來呼叫C語言中的main?

記得很久以前聽說在Linux執行檔案時,真正的起始點並不是main,加上之前有看到單純ld會幫你偷偷link一些沒看過的object檔案,所以這次就來看到底真相為何?

測試環境

因為很假掰想要順便接觸一下ARM的組語,所以這次測試就使用Qemu跑ARM的Debian。

$ lsb_release -a
No LSB modules are available.
Distributor ID:    Debian
Description:    Debian GNU/Linux 8.0 (jessie)
Release:    8.0
Codename:    jessie

$ file /bin/ls
/bin/ls: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=571db48d9c9e4625b7da206e748e41c237f2b202, stripped

測試原始碼,一樣是大家熟悉的Hellow world

#include <stdio.h>

int main()
{
    printf("Hello World\n");

    return 0;
}

不知道各位還記得前面有提過,執行檔中有.text的section。要執行的機械碼會放在這邊。我們先來看看hello1執行檔會從那邊開始?

$ readelf -h hello1
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           ARM
  Version:                           0x1
  Entry point address:               0x102f0
  Start of program headers:          52 (bytes into file)
...
Section header string table index: 33

從readelf可以看到起始點為0x102f0,那麼0x102f0是在那邊呢?我們再去看symbol table可以看到很巧的就是.text的起始點。

$ objdump -t hello1

hello1:     file format elf32-littlearm

SYMBOL TABLE:
00010134 l    d  .interp    00000000              .interp
...
000102f0 l    d  .text    00000000              .text

好了,那麼.text這邊起始的程式是什麼?

Disassembly of section .text:

000102f0 <_start>:
   102f0:       e3a0b000        mov     fp, #0
   102f4:       e3a0e000        mov     lr, #0
   102f8:       e49d1004        pop     {r1}            ; (ldr r1, [sp], #4)
   102fc:       e1a0200d        mov     r2, sp
   10300:       e52d2004        push    {r2}            ; (str r2, [sp, #-4]!)
   10304:       e52d0004        push    {r0}            ; (str r0, [sp, #-4]!)
   10308:       e59fc010        ldr     ip, [pc, #16]   ; 10320 <_start+0x30>
   1030c:       e52dc004        push    {ip}            ; (str ip, [sp, #-4]!)
   10310:       e59f000c        ldr     r0, [pc, #12]   ; 10324 <_start+0x34>
   10314:       e59f300c        ldr     r3, [pc, #12]   ; 10328 <_start+0x38>
   10318:       ebffffeb        bl      102cc <__libc_start_main@plt>
   1031c:       ebfffff0        bl      102e4 <abort@plt>
   10320:       000104b4        .word   0x000104b4
   10324:       00010420        .word   0x00010420
   10328:       00010448        .word   0x00010448

很有趣,沒看到main(),反而看到_start。到底是_start是什麼呢?還記得Linker script嗎?裡面有一個ENTRY指令,可以指定程式從那邊開始跑,先來看一下預設的ENTRY是不是也是_start?

$ ld --verbose | grep ENTRY
ENTRY(_start)

目前我們只知道執行檔起始點是_start,而不是main,那顯然有人幫你把執行檔加碼,以至於你的執行檔出現了_start。最偷懶的方式就是去找binary看看是不是有這樣的symbol。

user@host:/usr/lib$ find -name "*.[ao]" -exec nm -A {} \;  2> /dev/null | grep " _start$"
./arm-linux-gnueabi/crt1.o:00000000 T _start
./arm-linux-gnueabi/gcrt1.o:00000000 T _start
./arm-linux-gnueabi/Scrt1.o:00000000 T _start
./debug/usr/lib/arm-linux-gnueabi/crt1.o:00000000 T _start
./debug/usr/lib/arm-linux-gnueabi/gcrt1.o:00000000 T _start
./debug/usr/lib/arm-linux-gnueabi/Scrt1.o:00000000 T _start

OK,的確有object檔案裡面有_start,我們再來確認編譯的時候會不會link這些檔案。

$ gcc -v hello1.c
Using built-in specs.
COLLECT_GCC=gcc
...
COLLECT_GCC_OPTIONS='-v' '-march=armv4t' '-mfloat-abi=soft'
...
-X --hash-style=gnu -m armelf_linux_eabi
...
/usr/lib/gcc/arm-linux-gnueabi/4.9/../../../arm-linux-gnueabi/crt1.o
...

_start會呼叫外部函數__libc_start_main,我們透過LD_DEBUG來看一下。

$ LD_DEBUG=all ./hello1 2>&1 |grep __libc_start_main
       890:    symbol=__libc_start_main;  lookup in file=./hello1 [0]
       890:    symbol=__libc_start_main;  lookup in file=/lib/arm-linux-gnueabi/libc.so.6 [0]
       890:    binding file ./hello1 [0] to /lib/arm-linux-gnueabi/libc.so.6 [0]: normal symbol `__libc_start_main' [GLIBC_2.4]

可以看到,在./hello1中有去找__libc_start_main,最後去libc.so.6找,並且找出libc.so.6__libc_start_main的位址(即binding)。而__libc_start_mainprototype如下

int __libc_start_main(int (*main) (int, char **, char **), int argc, char ** ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void (*stack_end));

看到有趣的東西嘛?我有看到

  • main函數當作function pointer傳入
  • main函數的參數
  • 其他不知道三小的function pointer
    • init
      • fini
      • rtld_fini

從這邊我可以猜測這個函數就是呼叫一堆callback function,這些callback function就是上面列的死人骨頭。

手冊的說明可以看到__libc_start_main()是用來執行環境的初始化、呼叫main函數並且傳遞參數,當main函數結束後處理回傳值。手冊提到的範例詳細行為有

  • 檢查權限,確保安全性
  • thread subsystem初始化 (我可不知道什麼thread subsystem唷)
  • rtld_fini註冊release callback function,當shared object結束時使用該callback釋放資源
  • 呼叫init callback function
  • 呼叫main callback function並且帶入參數
  • 當main callback function結束後,將回傳值作為參數呼叫exit

我們再回頭看看_start的組合語言:

000102f0 <_start>:
   102f0:       e3a0b000        mov     fp, #0
   102f4:       e3a0e000        mov     lr, #0
   102f8:       e49d1004        pop     {r1}            ; (ldr r1, [sp], #4)
   102fc:       e1a0200d        mov     r2, sp
   10300:       e52d2004        push    {r2}            ; (str r2, [sp, #-4]!)
   10304:       e52d0004        push    {r0}            ; (str r0, [sp, #-4]!)
   10308:       e59fc010        ldr     ip, [pc, #16]   ; 10320 <_start+0x30>
   1030c:       e52dc004        push    {ip}            ; (str ip, [sp, #-4]!)
   10310:       e59f000c        ldr     r0, [pc, #12]   ; 10324 <_start+0x34>
   10314:       e59f300c        ldr     r3, [pc, #12]   ; 10328 <_start+0x38>
   10318:       ebffffeb        bl      102cc <__libc_start_main@plt>
   1031c:       ebfffff0        bl      102e4 <abort@plt>
   10320:       000104b4        .word   0x000104b4
   10324:       00010420        .word   0x00010420
   10328:       00010448        .word   0x00010448

有趣的地方是這3個位址

   10320:       000104b4        .word   0x000104b4
   10324:       00010420        .word   0x00010420
   10328:       00010448        .word   0x00010448

這邊可以看到這3個位址分別是

  • 10320: 000104b4 .word 0x000104b4
    • __libc_csu_fini
  • 10324: 00010420 .word 0x00010420
    • main
  • 10328: 00010448 .word 0x00010448
    • __libc_csu_init

也就是說,main__libc_csu_init分別當作第一和第四參數傳給__libc_start_main,而__libc_csu_fini則被丟到stack,一樣傳給__libc_start_main了。

結論

Linux執行程式的起始點並不是main,而是glibc binary中crt1.o準備的_start。這個start主要將你的main,還有一些hook函數丟給__libc_start_main,接下來libc的__libc_start_main樵好事情後才真正執行你的main,並且還要在main結束後清理戰場。

延伸閱讀

參考資料

完整反組譯程式碼

$ cat hello1.dis

hello1:     file format elf32-littlearm


Disassembly of section .init:

0001029c <_init>:
   1029c:    e92d4008     push    {r3, lr}
   102a0:    eb000021     bl    1032c <call_weak_fn>
   102a4:    e8bd4008     pop    {r3, lr}
   102a8:    e12fff1e     bx    lr

Disassembly of section .plt:

000102ac <puts@plt-0x14>:
   102ac:    e52de004     push    {lr}        ; (str lr, [sp, #-4]!)
   102b0:    e59fe004     ldr    lr, [pc, #4]    ; 102bc <_init+0x20>
   102b4:    e08fe00e     add    lr, pc, lr
   102b8:    e5bef008     ldr    pc, [lr, #8]!
   102bc:    00010318     .word    0x00010318

000102c0 <puts@plt>:
   102c0:    e28fc600     add    ip, pc, #0, 12
   102c4:    e28cca10     add    ip, ip, #16, 20    ; 0x10000
   102c8:    e5bcf318     ldr    pc, [ip, #792]!    ; 0x318

000102cc <__libc_start_main@plt>:
   102cc:    e28fc600     add    ip, pc, #0, 12
   102d0:    e28cca10     add    ip, ip, #16, 20    ; 0x10000
   102d4:    e5bcf310     ldr    pc, [ip, #784]!    ; 0x310

000102d8 <__gmon_start__@plt>:
   102d8:    e28fc600     add    ip, pc, #0, 12
   102dc:    e28cca10     add    ip, ip, #16, 20    ; 0x10000
   102e0:    e5bcf308     ldr    pc, [ip, #776]!    ; 0x308

000102e4 <abort@plt>:
   102e4:    e28fc600     add    ip, pc, #0, 12
   102e8:    e28cca10     add    ip, ip, #16, 20    ; 0x10000
   102ec:    e5bcf300     ldr    pc, [ip, #768]!    ; 0x300

Disassembly of section .text:

000102f0 <_start>:
   102f0:    e3a0b000     mov    fp, #0
   102f4:    e3a0e000     mov    lr, #0
   102f8:    e49d1004     pop    {r1}        ; (ldr r1, [sp], #4)
   102fc:    e1a0200d     mov    r2, sp
   10300:    e52d2004     push    {r2}        ; (str r2, [sp, #-4]!)
   10304:    e52d0004     push    {r0}        ; (str r0, [sp, #-4]!)
   10308:    e59fc010     ldr    ip, [pc, #16]    ; 10320 <_start+0x30>
   1030c:    e52dc004     push    {ip}        ; (str ip, [sp, #-4]!)
   10310:    e59f000c     ldr    r0, [pc, #12]    ; 10324 <_start+0x34>
   10314:    e59f300c     ldr    r3, [pc, #12]    ; 10328 <_start+0x38>
   10318:    ebffffeb     bl    102cc <__libc_start_main@plt>
   1031c:    ebfffff0     bl    102e4 <abort@plt>
   10320:    000104b4     .word    0x000104b4
   10324:    00010420     .word    0x00010420
   10328:    00010448     .word    0x00010448

0001032c <call_weak_fn>:
   1032c:    e59f3014     ldr    r3, [pc, #20]    ; 10348 <call_weak_fn+0x1c>
   10330:    e59f2014     ldr    r2, [pc, #20]    ; 1034c <call_weak_fn+0x20>
   10334:    e08f3003     add    r3, pc, r3
   10338:    e7932002     ldr    r2, [r3, r2]
   1033c:    e3520000     cmp    r2, #0
   10340:    012fff1e     bxeq    lr
   10344:    eaffffe3     b    102d8 <__gmon_start__@plt>
   10348:    00010298     .word    0x00010298
   1034c:    0000001c     .word    0x0000001c

00010350 <deregister_tm_clones>:
   10350:    e59f301c     ldr    r3, [pc, #28]    ; 10374 <deregister_tm_clones+0x24>
   10354:    e59f001c     ldr    r0, [pc, #28]    ; 10378 <deregister_tm_clones+0x28>
   10358:    e0603003     rsb    r3, r0, r3
   1035c:    e3530006     cmp    r3, #6
   10360:    912fff1e     bxls    lr
   10364:    e59f3010     ldr    r3, [pc, #16]    ; 1037c <deregister_tm_clones+0x2c>
   10368:    e3530000     cmp    r3, #0
   1036c:    012fff1e     bxeq    lr
   10370:    e12fff13     bx    r3
   10374:    000205ff     .word    0x000205ff
   10378:    000205fc     .word    0x000205fc
   1037c:    00000000     .word    0x00000000

00010380 <register_tm_clones>:
   10380:    e59f1024     ldr    r1, [pc, #36]    ; 103ac <register_tm_clones+0x2c>
   10384:    e59f0024     ldr    r0, [pc, #36]    ; 103b0 <register_tm_clones+0x30>
   10388:    e0601001     rsb    r1, r0, r1
   1038c:    e1a01141     asr    r1, r1, #2
   10390:    e0811fa1     add    r1, r1, r1, lsr #31
   10394:    e1b010c1     asrs    r1, r1, #1
   10398:    012fff1e     bxeq    lr
   1039c:    e59f3010     ldr    r3, [pc, #16]    ; 103b4 <register_tm_clones+0x34>
   103a0:    e3530000     cmp    r3, #0
   103a4:    012fff1e     bxeq    lr
   103a8:    e12fff13     bx    r3
   103ac:    000205fc     .word    0x000205fc
   103b0:    000205fc     .word    0x000205fc
   103b4:    00000000     .word    0x00000000

000103b8 <__do_global_dtors_aux>:
   103b8:    e92d4010     push    {r4, lr}
   103bc:    e59f401c     ldr    r4, [pc, #28]    ; 103e0 <__do_global_dtors_aux+0x28>
   103c0:    e5d43000     ldrb    r3, [r4]
   103c4:    e3530000     cmp    r3, #0
   103c8:    1a000002     bne    103d8 <__do_global_dtors_aux+0x20>
   103cc:    ebffffdf     bl    10350 <deregister_tm_clones>
   103d0:    e3a03001     mov    r3, #1
   103d4:    e5c43000     strb    r3, [r4]
   103d8:    e8bd4010     pop    {r4, lr}
   103dc:    e12fff1e     bx    lr
   103e0:    000205fc     .word    0x000205fc

000103e4 <frame_dummy>:
   103e4:    e92d4008     push    {r3, lr}
   103e8:    e59f0028     ldr    r0, [pc, #40]    ; 10418 <frame_dummy+0x34>
   103ec:    e5903000     ldr    r3, [r0]
   103f0:    e3530000     cmp    r3, #0
   103f4:    1a000001     bne    10400 <frame_dummy+0x1c>
   103f8:    e8bd4008     pop    {r3, lr}
   103fc:    eaffffdf     b    10380 <register_tm_clones>
   10400:    e59f3014     ldr    r3, [pc, #20]    ; 1041c <frame_dummy+0x38>
   10404:    e3530000     cmp    r3, #0
   10408:    0afffffa     beq    103f8 <frame_dummy+0x14>
   1040c:    e1a0e00f     mov    lr, pc
   10410:    e12fff13     bx    r3
   10414:    eafffff7     b    103f8 <frame_dummy+0x14>
   10418:    000204e8     .word    0x000204e8
   1041c:    00000000     .word    0x00000000

00010420 <main>:
   10420:    e92d4800     push    {fp, lr}
   10424:    e28db004     add    fp, sp, #4
   10428:    e59f0014     ldr    r0, [pc, #20]    ; 10444 <main+0x24>
   1042c:    ebffffa3     bl    102c0 <puts@plt>
   10430:    e3a03000     mov    r3, #0
   10434:    e1a00003     mov    r0, r3
   10438:    e24bd004     sub    sp, fp, #4
   1043c:    e8bd4800     pop    {fp, lr}
   10440:    e12fff1e     bx    lr
   10444:    000104c8     .word    0x000104c8

00010448 <__libc_csu_init>:
   10448:    e92d43f8     push    {r3, r4, r5, r6, r7, r8, r9, lr}
   1044c:    e59f6058     ldr    r6, [pc, #88]    ; 104ac <__libc_csu_init+0x64>
   10450:    e59f5058     ldr    r5, [pc, #88]    ; 104b0 <__libc_csu_init+0x68>
   10454:    e08f6006     add    r6, pc, r6
   10458:    e08f5005     add    r5, pc, r5
   1045c:    e0656006     rsb    r6, r5, r6
   10460:    e1a07000     mov    r7, r0
   10464:    e1a08001     mov    r8, r1
   10468:    e1a09002     mov    r9, r2
   1046c:    ebffff8a     bl    1029c <_init>
   10470:    e1b06146     asrs    r6, r6, #2
   10474:    0a00000a     beq    104a4 <__libc_csu_init+0x5c>
   10478:    e2455004     sub    r5, r5, #4
   1047c:    e3a04000     mov    r4, #0
   10480:    e2844001     add    r4, r4, #1
   10484:    e5b53004     ldr    r3, [r5, #4]!
   10488:    e1a00007     mov    r0, r7
   1048c:    e1a01008     mov    r1, r8
   10490:    e1a02009     mov    r2, r9
   10494:    e1a0e00f     mov    lr, pc
   10498:    e12fff13     bx    r3
   1049c:    e1540006     cmp    r4, r6
   104a0:    1afffff6     bne    10480 <__libc_csu_init+0x38>
   104a4:    e8bd43f8     pop    {r3, r4, r5, r6, r7, r8, r9, lr}
   104a8:    e12fff1e     bx    lr
   104ac:    00010088     .word    0x00010088
   104b0:    00010080     .word    0x00010080

000104b4 <__libc_csu_fini>:
   104b4:    e12fff1e     bx    lr

Disassembly of section .fini:

000104b8 <_fini>:
   104b8:    e92d4008     push    {r3, lr}
   104bc:    e8bd4008     pop    {r3, lr}
   104c0:    e12fff1e     bx    lr