C++ Learning Community Forum
August 01, 2010, 02:34:57 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: Hello. Smiley
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: Prolfileing  (Read 1320 times)
FrozenKnight
ASM Freak
Global Moderator
Dr. of C++ology
*****
Posts: 546


Do it yourself it's the only way to learn.


View Profile
« on: November 30, 2007, 11:35:18 AM »

I just decided to post the way i profile my programs. I'll attempt to use inline ASM so that it can be used in most of your C++ programs.

The instruction RDTSC is a highly accurate timer used by the x86 instruction set. it returns the current clock cycle in EDX:EAX this means it returns a 64 bit integer in two 32 bit registers. making it ideal for most timing applications.
Note the instruction does take time to call it's self and any other instructions used (ex: for saving the result) are not accounted for, so timeings will be a little longer than the actual time of your instructions.

Reasons for it's use: it can be used to get an idea of actual time a function or set of instructions really take. It can be used with optimized code (which a lot of profilers can't). and it's more accurate than the C and C++ timing functions. and it can be used to time multipul functions

Ok. enough background now for some functions.

Code:
inline void start_profile(void *ptr64bitint)
{
  _ASM {
    RDTSC
    mov  [ptr64bitint + 4], edx
    mov  [ptr64bitint], eax
  }
  return;
}
the above code should save the current clock to a pointer to a 64 bit int

now for some code that will use that will subtract the total time from that number
Code:
inline unsigned int end_profile(void *ptr64bitint)
{
  _ASM {
    RDTSC
    sub  [ptr64bitint + 4], edx
    sbb  [ptr64bitint], eax
  }
  return (unsigned int)(void)(*(ptr64bitint))
)

the return should be an unsigned 32 bit int containing the 32 least significant bits of the 64 bit pointer, which for most applications is enough. in some cases you may need to access the 64 bit number directly. either way the return should contain the amount of time between RDTSC calls in what ever unit system your processor uses. (AMD processors will use an actual clock count giving you a total amount instruction cycles taken.)

the code used here is untested and may contain bugs please feel free to post corrections and i will update the code accordingly.

// fixed code to properly save in little edian.
« Last Edit: December 02, 2007, 07:57:54 PM by FrozenKnight » Logged


Imagine the impossible, then make it happen.
ih8censorship
Megalomaniac!!!
Administrator
C++ guru
*****
Posts: 1236



View Profile
« Reply #1 on: November 30, 2007, 06:36:01 PM »

Where you used unsigned int as a 64 bit value confused me, because on the machines i have, unsigned int is 4 bytes/32 bits. I fixed up your code a little, it compiles and runs on vc++ 6 now (ya i know its old and not standard... the other compiler gave me an error about RDTSC being undeclared) here is what i got to run.

Code:
#include <iostream>
using namespace std;

inline void start_profile(double *ptr64bitint)
{
_asm {
    RDTSC
    mov  [ptr64bitint], edx
    mov  [ptr64bitint + 4], eax
}
  return;
}

inline double end_profile(double *ptr64bitint)
{
  _asm {
    RDTSC
    sub  [ptr64bitint], edx
    sbb  [ptr64bitint + 4], eax
  }
  return (double)(*(ptr64bitint + 4));
}

int main()
{
    double bits64;
    start_profile(&bits64);
    cout<<end_profile(&bits64);
    cin.get();
    return 0;
}

**edit**
actually on second thought, couldnt you get rid of the +4 where the time is calculated and returned? Well when using all 8 bits like that... because +4 would take it to the middle of the double... and run it 4 bytes over the end and that wouldnt be good... so youd want something more like return (double)(*ptr64bitint ); or perhaps return *ptr64bitint; in that case.
« Last Edit: November 30, 2007, 08:02:33 PM by ih8censorship » Logged

PC==perfect_companion

Knowledge cannot come packaged and predigested; it must be chewed over carefully before swallowed.

What have you tried?
FrozenKnight
ASM Freak
Global Moderator
Dr. of C++ology
*****
Posts: 546


Do it yourself it's the only way to learn.


View Profile
« Reply #2 on: December 02, 2007, 07:56:09 PM »

in your example with the return being a double you would need to get rid of the +4 in the command
Code:
return (double)(*(ptr64bitint + 4));
however you still need it in the command. and thanks i just noticed a problem with my code over all. with bytes being little edian on intell processors i shouldn't need the +4 in the return either i just screwed up and stored them backwards. FIXING....
Logged


Imagine the impossible, then make it happen.
ih8censorship
Megalomaniac!!!
Administrator
C++ guru
*****
Posts: 1236



View Profile
« Reply #3 on: December 03, 2007, 12:13:10 AM »

Im sorry im still not understanding why your using an unsigned int, what sort of system are you on?
Logged

PC==perfect_companion

Knowledge cannot come packaged and predigested; it must be chewed over carefully before swallowed.

What have you tried?
adeyblue
Dr. of C++ology
****
Posts: 653

Taming the turntables a beat at a time


View Profile WWW
« Reply #4 on: December 03, 2007, 01:21:36 AM »

RDTSC may not work as you expect on multi-core machines, unfortunately the solutions aren't much better. His paper (linked in his post) is quite interesting if you're into horology.
Logged

FrozenKnight
ASM Freak
Global Moderator
Dr. of C++ology
*****
Posts: 546


Do it yourself it's the only way to learn.


View Profile
« Reply #5 on: December 03, 2007, 07:24:43 PM »

RDTSC may not work as you expect on multi-core machines, unfortunately the solutions aren't much better. His paper (linked in his post) is quite interesting if you're into horology.

Yes, i know about that but there are a few fixes. AMD has one that fixes such problems by syncing the clocks on both processors another solution is to force the code containing the RDTSC instruction to run on one core only, which can be done on any system via API and can be done on winXP with the task manager. So it's not as bas as your article claimed.

And H8 i returned unsigned int because doubles use a different format from int's. and as such you cant use a double returned like this to make C style comparisons on all systems (At least i don't think so more testing may be needed). but in the case of an Int it will always work. and there aren't many cases where that upper 32 bits will be needed anyway. if you need proof h8 then try using the double method to average a section of code run 100 times or more. (btw. it is normal to see a variation of 1-2 cycles of variation even in small code segments due to memory reset.)
Logged


Imagine the impossible, then make it happen.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!