Wednesday, 31 October 2012

gcc function attribute regparam and process stack

Using register whenever possible is very impressive in case of performance critical code. we can hint compiler to use register for particular variable by declaring storage class as register, let see how we can extend this to next level.

GNU gcc compiler support an attribute __attribute__((regparm(X))) we can use this to control use of registers in our code.

I went in search of this thing when I intermixed some ia32 assembly code with c code. My fear was, what is the guarantee that compiler when forced to do heavy optimization will not use registers to pass argument to functions instead of using stack. Also with 64 bit newer processor they have lots of r* regs to use. Also how can I force compiler to use registers (whenever required). Following is sample code,
main.c


 
  2 #define asmlinkage  __attribute__((regparm(0)))
  3 #define fastcall  __attribute__((regparm(3)))
  4 
  5 extern int asm_func1(int a, int b, int c);
  6 
  7 int main()
  8 {
  9         int retval = 0;
 10         retval = asm_func1(0x10, 0x20, 0x30);
 11         return retval;
 12 }


and asm.S

  1 .data
  2 .bss
  3 .text
  4 .global asm_func1
  5 asm_func1: 
  6         pushl %ebp
  7         movl %esp, %ebp
  8         movl 8(%ebp), %eax
  9         movl 12(%ebp), %ecx
 10         movl 16(%ebp), %edx
 11         addl %ecx, %eax
 12         addl %edx, %eax
 13         movl %ebp, %esp
 14         pop %ebp
 15         ret

when compile with gcc -Wall main.c asm.S -O2 gcc generates,

080483a0 <main>:
 80483a0:       55                                     push   %ebp
 80483a1:       89 e5                                mov    %esp,%ebp
 80483a3:       83 e4 f0                           and    $0xfffffff0,%esp
 80483a6:       83 ec 10                           sub    $0x10,%esp
 80483a9:       c7 44 24 08 30 00 00    movl   $0x30,0x8(%esp)
 80483b0:       00
 80483b1:       c7 44 24 04 20 00 00    movl   $0x20,0x4(%esp)
 80483b8:       00
 80483b9:       c7 04 24 10 00 00 00    movl   $0x10,(%esp)
 80483c0:       e8 03 00 00 00               call   80483c8 <asm_func1>

We can see that gcc has prepared stack before call,
now when we change prototype at line 5 of main.c to 

  5 asmlinkage extern int asm_func1(int a, int b, int c);

Still disasm shows parameter's are on stack. But when we changed prototype to 

 5 fastcall extern int asm_func1(int a, int b, int c);

dump changed to,
080483a0 <main>:
 80483a0:       55                              push   %ebp
 80483a1:       b9 30 00 00 00          mov    $0x30,%ecx
 80483a6:       89 e5                         mov    %esp,%ebp
 80483a8:       ba 20 00 00 00          mov    $0x20,%edx
 80483ad:       83 e4 f0                     and    $0xfffffff0,%esp
 80483b0:       b8 10 00 00 00          mov    $0x10,%eax
 80483b5:       e8 06 00 00 00          call   80483c0 <asm_func1>
 80483ba:       89 ec                          mov    %ebp,%esp
 80483bc:       5d                              pop    %ebp

Now all three params are passed via, eax (1st arg) , ecx( 3rd arg) and edx (2nd arg) and remember that these are caller saved reg.

This example demonstrates,
- If used caller and callee aware, both functions can use argument passing via registers, something different from stack model. But note that this is not always true that this type of code will be better performer, we are limiting gcc from effective usage of regs.
- with number of regs as 0 we can be sure that call to said function will never result optimization by using register in 32 bit code.
- Prototype declaration of extern function is sufficient to hint compiler.
- Any param more that 3 will be passed on stack. 
- There is something more then register int a; in C.
I did not found limitation of number 3, but my guess is as per c calling convention caller saves these 3 regs and there are limited number of regs on 32 bit system.

When you force number more then 3 in regparam gcc will warn and will not use any register for arg passing. Ignoring such warning can break code if callee is expecting args in register.





1 comment:

  1. #~/test/func_call_ana$ cat new.s
    .file "new.c"
    .text
    .globl main
    .type main, @function
    main:
    .LFB0:
    .cfi_startproc
    pushq %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq %rsp, %rbp
    .cfi_def_cfa_register 6
    movl $48, %edx
    movl $32, %esi
    movl $16, %edi
    movl $0, %eax
    call function
    movl $0, %eax
    popq %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
    .LFE0:
    .size main, .-main
    .ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
    .section .note.GNU-stack,"",@progbits
    #~/test/func_call_ana$ cat function.s
    .file "function.c"
    .text
    .globl function
    .type function, @function
    function:
    .LFB0:
    .cfi_startproc
    pushq %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq %rsp, %rbp
    .cfi_def_cfa_register 6
    movl %edi, -20(%rbp)
    movl %esi, -24(%rbp)
    movl %edx, -28(%rbp)
    movl -24(%rbp), %eax
    movl -20(%rbp), %edx
    addl %edx, %eax
    addl -28(%rbp), %eax
    movl %eax, -4(%rbp)
    popq %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
    .LFE0:
    .size function, .-function
    .ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
    .section .note.GNU-stack,"",@progbits


    for files new.c and function.c
    #~/test/func_call_ana$ cat new.c
    int main()
    {
    function(0x10, 0x20, 0x30);
    return 0;
    }

    #~/test/func_call_ana$ cat function.c
    void function(int a,int b,int c)
    {
    int sum = a + b + c;
    return;
    }

    In main function, arguments are moved to register and then inside the function the putting them back to stack in 64bit gcc [gcc ( Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 ] compiler?

    is this some kind of optimization ?
    and why there is one more extra operation
    movl $0, %eax
    in main?
    Any Idea!

    ReplyDelete