百科问答小站 logo
百科问答小站 font logo



都说寄存器比内存快,但是为什么有些时候运行显示的是寄存器更慢? 第1页

  

user avatar   bei-ji-85 网友的相关建议: 
      

这种时候就要看规范了。

规范没规定register一定要用寄存器。

规范:Storage-class specifiers

register - automatic duration and no linkage; address of this variable cannot be taken
2) The register specifier is only allowed for objects declared at block scope, including function parameter lists. It indicates automatic storage duration and no linkage (which is the default for these kinds of declarations), but additionally hints the optimizer to store the value of this variable in a CPU register if possible. Regardless of whether this optimization takes place or not, variables declared register cannot be used as arguments to the address-of operator, cannot use alignas (since C11), and register arrays are not convertible to pointers.

看不懂英文的话,中文翻译:存储类指定符 - cppreference.com

register - 自动存储期与无链接;不能取这种对象的地址
2) register 指定符只对声明于块作用域的对象允许,包括函数参数列表。它指示自动存储期与无链接(即这种声明的默认属性),但另外提示优化器,若可能则将此对象的值存储于 CPU 寄存器中。无论此优化是否发生,声明为 register 的对象不能用作取址运算符的参数,不能用 _Alignas (C11 起),而且 register 数组不能转换为指针。

规范上只是规定可能,而不是必须

另外,你这个计时粒度太粗了,而且要测性能,还要独占CPU,甚至还需要关闭调试信息才行。

下面是一个VC2017的代码。

使用内联汇编和高精度计数器计时。

       #include <stdio.h> #include <Windows.h>  #define TIME 1000000000 int m, n = TIME;  int main() {     LARGE_INTEGER freq, start, end;     int x, y = TIME;      if (QueryPerformanceFrequency(&freq) == FALSE)     {         printf("Can not get performance freq
");         return -1;     }     printf("freq = %lld
", freq.QuadPart);      if (QueryPerformanceCounter(&start) == FALSE)     {         printf("Fail to get counter
");         return -1;     }      for (m = 0; m < n; m++);      if (QueryPerformanceCounter(&end) == FALSE)     {         printf("Fail to get counter
");         return -1;     }      printf("Counter = %lld Time = %lld ms
", end.QuadPart - start.QuadPart, (end.QuadPart - start.QuadPart) * 1000 / freq.QuadPart);      if (QueryPerformanceCounter(&start) == FALSE)     {         printf("Fail to get counter
");         return -1;     }          for (x = 0; x < y; x++);      if (QueryPerformanceCounter(&end) == FALSE)     {         printf("Fail to get counter
");         return -1;     }     printf("Counter = %lld Time = %lld ms
", end.QuadPart - start.QuadPart, (end.QuadPart - start.QuadPart) * 1000 / freq.QuadPart);      if (QueryPerformanceCounter(&start) == FALSE)     {         printf("Fail to get counter
");         return -1;     }     __asm     {         push ecx;         push ebx;         mov ecx, 0;         mov ebx, TIME; loop1:         inc ecx;         cmp ecx, ebx;         jne loop1;         pop ebx;         pop ecx;     }      if (QueryPerformanceCounter(&end) == FALSE)     {         printf("Fail to get counter
");         return -1;     }      printf("Counter = %lld Time = %lld ms
", end.QuadPart - start.QuadPart, (end.QuadPart - start.QuadPart) * 1000 / freq.QuadPart);      return 0; }     

Windows上运行结果:

       freq = 3023438 Counter = 6030087 Time = 1994 ms Counter = 6040404 Time = 1997 ms Counter = 747601 Time = 247 ms     

寄存器的速度还是快的很明显。

Linux下GCC代码(使用rdtsc获得高精度计时):

       #include <stdio.h>  #define TIME 1000000000 #define STR(x) #x #define INT2STR(x) STR(x) int m, n = TIME; long long GetTSC() {     long long tsc;     __asm__ __volatile__ ("rdtsc" : "=A" (tsc));     return tsc; }  int main() {     long long start, end;     int x, y = TIME;          start = GetTSC();     for (m = 0; m < n; m++);     end = GetTSC();      printf("Counter = %lld Time = %lld ms
", end - start, (end - start) / 1000000);      start = GetTSC();     for (x = 0; x < y; x++);     end = GetTSC();      printf("Counter = %lld Time = %lld ms
", end - start, (end - start) / 1000000);      start = GetTSC();      __asm__("pushl %ecx
	"             "pushl %ebx
	"             "movl $0, %ecx
	"             "movl $" INT2STR(TIME) ", %ebx
	"             "loop1: incl %ecx
	"             "cmp %ecx, %ebx
	"             "jne loop1
	"             "popl %ebx
	"             "popl %ecx
	");      end = GetTSC();          printf("Counter = %lld Time = %lld ms
", end - start, (end - start) / 1000000);      return 0; }     

运行结果

       Counter = 5690200100 Time = 5690 ms Counter = 5730137064 Time = 5730 ms Counter = 628730072 Time = 628 ms     




  

相关话题

  32位保护模式下,段基地址应尽量选取16字节对齐的那些地址,可使访问的性能最大化? 
  有没有一本从电路开始讲然后汇编再到C语言的书? 
  多核cpu多线程同时执行cmpxchg指令会发生什么? 
  汇编的ret怎么区分近返回还是远返回? 
  为什么在汇编语言中需大量使用跳转指令,而在C语言中却尽量避免使用goto语句呢? 
  CPU 只能进行数值运算,那么计算机是怎么显示出字符的? 
  为什么保护模式只能向更高特权级代码段转移控制? 
  不同指令集架构的芯片,是如何去支持基础软件库的? 
  386处理器的64TB的虚拟寻址空间,对我们的使用,有什么现实意义吗? 
  CPU检测到中断信号时,怎么知道是发给哪个进程的? 

前一个讨论
你认为自己颜值巅峰是哪一张照片?
下一个讨论
是否有可能以USB Type-C物理接口替代SATA物理接口?





© 2025-03-28 - tinynew.org. All Rights Reserved.
© 2025-03-28 - tinynew.org. 保留所有权利