| Home | My Accounts | Newsletter | News Flash | Contact Us | Search |
|
 
|
by Jake Jenkins Table of Contents: Overview The Good The Bad Optimization Switches Performance vs. GCC Caveats Overview Recently, Intel released its newest compilers for C/C++ and Fortran77/90 for the Linux platform. Calling them simply C++ and Fortran Compilers 5.0 does a them bit of disservice. These compilers have been long awaited by developers and it seems the wait has been worth it. Most serious developers on the Intel architecture know that while these processors are capable of performance rivaling that of high end workstation CPU's, making the Intel's peak performance available required the purchase of commercial libraries or hours of tedious assembly coding. This is due to the fact that there exists on every Intel processor since the Pentium III, a special feature known as SSE. SSE is really just a marketing term for the Pentium III and later's high powered floating point unit. SSE is actually composed of 8 128bit registers whose sole purpose is to perform floating point arithmetic. As you will see later in the article, utilizing these extra registers to their fullest potential can lead to tremendous performance increases in applications which make heavy use of floating point calculations. Fortunately for those of us who use Linux, the compilers can be had for free for non-commercial use. For Windows users in an academic environment, the cost is relatively low as well at under $100 per user. Besides the price, the Intel compilers have several other benefits:
There are few drawbacks to using the Intel compilers. One of the things that I noticed is that icc (how the Intel compilers are invoked) is a bit more picky about the code it will compile. There were a few instances where gcc let me get away with a bad malloc() statement without even a warning, but icc refused to compile the exact same code. After a bit of debugging, I noticed that I had commented out the proper malloc statement and replaced it with a bad one for reasons unknown. The point is this; icc, especially when compiling C++ code, is pickier about syntax and other things like memory allocation. Whether or not this is actually a bad thing is left to the reader to decide. The only serious drawback for Linux users is that the Intel compilers will not compile to Linux kernel. A lesser problem is the fact that objects created with icc will not link to objects created with gcc and vice versa. For this reason, if a programmer has started a project with gcc, he should consider the problems that can arise should he switch compilers during the development cycle. When testing the Intel compiler's performance, I used three optimization switches.
When compiling the programs with GCC, I used the following switches. -O9 -funroll-loops -ffast-math -fomit-frame-pointer -malign-double -mcpu=pentiumpro -finline-functions -march=pentiumpro -fno-exceptions Note that I was using GCC version 2.95.3. I used three main benchmarks: Stream, a memory performance benchmark; Whetstone, a floating point performance benchmark; and a program that I wrote that is also floating point intensive. To save time and frustration, I set up an environment variable "FASTGCC" that contains the switches mentioned above. Stream Results: Stream is a well known benchmark used to test the memory performance of computers. I edited its makefile to reflect my choice of compilers and to ensure that the correct optimization switches were being used for each one. It was interesting to see that there was a significant performance difference between the two compilers. The difference likely comes from that fact that the Intel compiler more aggressively aligns the data structures in memory resulting in optimal memory controller performance while gcc takes a more "get in where you fit in" approach. Note that this benchmark requires a bit of tweaking from system to system in order to produce valid results. On my system, I increased the array size to 9 million to obtain a run time long enough to produce reliable results. Results are in megabytes/second and a higher number is a better score.
Results using icc
Results using gcc
Whetstone Results: Whetstone is a well known benchmark normally used to test the floating point performance of processors. I edited its makefile to reflect my choice of compilers and to ensure that the correct optimization switches were being used for each one. When running the benchmark, the user must specify one parameter, the number of iterations to perform. As you can see, I chose to perform 1 million iterations. The results you see below are typical. Variance between runs was less than 5%.
Using icc
jjake@ars:~/bench$ ./iccWhet 1000000 Using gcc jjake@ars:~/bench$ ./gccWhet 1000000 The results are impressive. The Intel compilers enjoy an approximately 100% performance lead. This is surely the result of using the SSE registers. GCC leaves these untapped and is therefore thoroughly defeated. One oddity did occur during the testing of this benchmark, however. When using the -wp_ipo optimization switch with the Intel compiler, performance skyrocketed. The results are included here, but take them with a grain of salt. This seems almost too good to be true. Questionable icc results jjake@ars:~/bench$ ./otherWhet 1000000 This is a truly amazing performance increase, approximately 675%! It is certainly not typical, however, so your mileage may vary. Test.c Results: This program performs millions of floating point adds, subracts, compares, and multiplies. It also utilizes the rand(), cos(), sin(), tan(), and sqrt() functions from the C math library. The results you see below are representative of typical results. The differences between runs of this benchmark were not significant and varied by less than 5%. As you can see, I used the Unix command "time" to measure how long it took each of these programs to execute. The results are clear. A typical run for the exact same code compiled with "icc -O3 -xK -wp_ipo test.c -o iccTest -lm" and "gcc $FASTGCC test.c -o gccTest -lm" show over a 30 second difference in execution time.
Using icc Using gcc The difference is over 400% in favor of the Intel compiler. That is a very significant speed increase over gcc. Keep in mind that these benchmarks focused only on two areas where the Intel compilers are known to be superior, memory performance and floating point performance. The differences when compiling code that contains mainly integer operations will not be as significant. Also, comparing two compilers can not be done properly using only the benchmarks presented here. This article should only serve to increase your interest in these new tools. Making the switch to a different compiler is not a trivial task. Currently, Intel provides the compiler in RPM format for RedHat 6.2 or 7.1 operating systems only. With some creative work, however they can be made to work on other Linux distributions. Links |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
