Home My Accounts | Newsletter | News Flash | Contact Us | Search

 

Issues/Circulation
Current Issue: May 2002
Other Issues
Unix/Linux
Installing PhpMyAdmin on Irix
Intel C Compiler for Linux
Applications
HDF
GNU
Gaussian Updates on Mimosa
GAMESS Update on Sweetgum
Systems
Sweetgum Uprade Improves Queue Fairness
Mimosa Maintenance
Services
CMRG Site Updated
Highlights
Instruction
UM MIS 408 Uses Informix on Cypress
Research
Norcum, Under the Microscope
Taking Off with I2 at Ole Miss
Events
MCSR Posters Presented at MAS
The MCSR Parallel-O-Gram
The New Intel C Compiler for Linux

by Jake Jenkins

Table of Contents:
Overview
The Good
The Bad
Optimization Switches
Performance vs. GCC
Caveats

Overview

Recently, Intel released its newest compilers for C/C++ and Fortran77/90 for the Linux platform. Calling them simply C++ and Fortran Compilers 5.0 does a them bit of disservice. These compilers have been long awaited by developers and it seems the wait has been worth it. Most serious developers on the Intel architecture know that while these processors are capable of performance rivaling that of high end workstation CPU's, making the Intel's peak performance available required the purchase of commercial libraries or hours of tedious assembly coding. This is due to the fact that there exists on every Intel processor since the Pentium III, a special feature known as SSE. SSE is really just a marketing term for the Pentium III and later's high powered floating point unit. SSE is actually composed of 8 128bit registers whose sole purpose is to perform floating point arithmetic. As you will see later in the article, utilizing these extra registers to their fullest potential can lead to tremendous performance increases in applications which make heavy use of floating point calculations.

The Good

Fortunately for those of us who use Linux, the compilers can be had for free for non-commercial use. For Windows users in an academic environment, the cost is relatively low as well at under $100 per user.

Besides the price, the Intel compilers have several other benefits:

Commonality with GCC.  If you are familiar with GCC's syntax and switches, the transition to the Intel compilers will not be difficult.
Increased Performance Base performance increase can be expected to be around 30% and over 100% under certain circumstances
Visual Studio compatibility. While we have not tested this feature here at the MCSR, Intel claims that code created by the Microsoft compiler is fully compatible with code created with the Intel compiler. That is to say that objects created from one compiler (.o files) will link to .o files created with the other.

The Bad   

There are few drawbacks to using the Intel compilers. One of the things that I noticed is that icc (how the Intel compilers are invoked) is a bit more picky about the code it will compile. There were a few instances where gcc let me get away with a bad malloc() statement without even a warning, but icc refused to compile the exact same code. After a bit of debugging, I noticed that I had commented out the proper malloc statement and replaced it with a bad one for reasons unknown. The point is this; icc, especially when compiling C++ code, is pickier about syntax and other things like memory allocation. Whether or not this is actually a bad thing is left to the reader to decide.

The only serious drawback for Linux users is that the Intel compilers will not compile to Linux kernel. A lesser problem is the fact that objects created with icc will not link to objects created with gcc and vice versa. For this reason, if a programmer has started a project with gcc, he should consider the problems that can arise should he switch compilers during the development cycle.

Optimization Switches

When testing the Intel compiler's performance, I used three optimization switches.

-O3 This is roughly similar to GCC's -O3 switch. It provides lots of high level optimization.
-xK This switch turns on SSE optimization. It makes a huge impact on performance.
-wp_ipo This switch turns on interprocedural optimization. It does many different things like process inlining and loop unrolling, or whatever the compiler thinks will improve the performance of your code. Note that this switch will only work if the entire source code for a program is available in one file. The multi-file equivalent is -ipo.

When compiling the programs with GCC,  I used the following switches.

-O9 -funroll-loops -ffast-math -fomit-frame-pointer -malign-double -mcpu=pentiumpro -finline-functions -march=pentiumpro -fno-exceptions

Note that I was using GCC version 2.95.3.

Performance vs. GCC

I used three main benchmarks: Stream, a memory performance benchmark; Whetstone, a floating point performance benchmark; and a program that I wrote that is also floating point intensive. To save time and frustration, I set up an environment variable "FASTGCC" that contains the switches mentioned above.

Stream Results:

Stream is a well known benchmark used to test the memory performance of computers. I edited its makefile to reflect my choice of compilers and to ensure that the correct optimization switches were being used for each one. It was interesting to see that there was a significant performance difference between the two compilers. The difference likely comes from that fact that the Intel compiler more aggressively aligns the data structures in memory resulting in optimal memory controller performance while gcc takes a more "get in where you fit in" approach.

Note that this benchmark requires a bit of tweaking from system to system in order to produce valid results. On my system, I increased the array size to 9 million to obtain a run time long enough to produce reliable results.

Results are in megabytes/second and a higher number is a better score.

 

Results using icc

Function  Rate (MB/s)
Assignment: 318.101
Scaling :  319.415
Summing : 355.618 
SAXPYing : 354.916

 Results using gcc  
 
 

Function Rate (MB/s)
Assignment 292.912
Scaling 287.903
Summing 351.923
SAXPYing 352.805


   
 

Whetstone Results:

Whetstone is a well known benchmark normally used to test the floating point performance of processors. I edited its makefile to reflect my choice of compilers and to ensure that the correct optimization switches were being used for each one.  When running the benchmark, the user must specify one parameter, the number of iterations to perform. As you can see, I chose to perform 1 million iterations. The results you see below are typical. Variance between runs was less than 5%.

 

Using icc

jjake@ars:~/bench$ ./iccWhet 1000000
Loops: 1000000, Iterations: 1, Duration: 138 sec.
C Converted Double Precision Whetstones: 724.6 MIPS

Using gcc

jjake@ars:~/bench$ ./gccWhet 1000000
Loops: 1000000, Iterations: 1, Duration: 277 sec.
C Converted Double Precision Whetstones: 361.0 MIPS

The results are impressive. The Intel compilers enjoy an approximately 100% performance lead. This is surely the result of using the SSE registers. GCC leaves these untapped and is therefore thoroughly defeated. 

One oddity did occur during the testing of this benchmark, however. When using the -wp_ipo optimization switch with the Intel compiler, performance skyrocketed. The results are included here, but take them with a grain of salt. This seems almost too good to be true.

Questionable icc results

jjake@ars:~/bench$ ./otherWhet 1000000
Loops: 1000000, Iterations: 1, Duration: 41 sec.
C Converted Double Precision Whetstones: 2439.0 MIPS

This is a truly amazing performance increase, approximately 675%! It is certainly not typical, however, so your mileage may vary.

Test.c Results:

This program performs millions of floating point adds, subracts, compares, and multiplies. It also utilizes the rand(), cos(), sin(), tan(), and sqrt() functions from the C math library. The results you see below are representative of typical results. The differences between runs of this benchmark were not significant and varied by less than 5%.

As you can see, I used the Unix command "time" to measure how long it took each of these programs to execute. The results are clear. A typical run for the exact same code compiled with "icc -O3 -xK -wp_ipo test.c -o iccTest -lm" and "gcc $FASTGCC test.c -o gccTest -lm" show over a 30 second difference in execution time.


 

Using icc
jjake@ars:~/handyStuff$ time iccTest
real 0m11.482s
user 0m11.300s
sys 0m0.050s

Using gcc
jjake@ars:~/handyStuff$ time gccTest
real 0m47.982s
user 0m46.540s
sys 0m0.070s

The difference is over 400% in favor of the Intel compiler. That is a very significant speed increase over gcc.

Caveats

Keep in mind that these benchmarks focused only on two areas where the Intel compilers are known to be superior, memory performance and floating point performance. The differences when compiling code that contains mainly integer operations will not be as significant. Also, comparing two compilers can not be done properly using only the benchmarks presented here. This article should only serve to increase your interest in these new tools.

Making the switch to a different compiler is not a trivial task. Currently, Intel provides the compiler in RPM format for RedHat 6.2 or 7.1 operating systems only. With some creative work, however they can be made to work on other Linux distributions.

Links

Intel

A good benchmark repository

MCSR

--------------------------------
Last Modified: Wednesday, 24-Apr-2002 13:12:20 CDT
Copyright © 1997-2005 The Mississippi Center for Supercomputing Research. All Rights Reserved.
[an error occurred while processing this directive]