|
 
|
|
Optimization Tips: The -O Intel Compiler Option (Redwood)
by Frank Mathew
Redwood is an Altix high performance computer server with Itanium 2 microprocessors (More information here).The Intel compiler for FORTRAN and C/C++ available on Redwood provides optimization options that can enhance the performance of programs. Code can be optimized depending on the structure of the program (i.e. lot of floating point operations or a lot of loops performed or a lot of i/o operations etc) and to suit the underlying Intel architecture. Since Redwood has Linux operating system running and it consists of Itanium 2 processors, in this series we will look at some means by which we can enhance the performance of our programs on Itanium 2 processors in a Linux environment using the Intel compiler.
- Option ‘–O0’ (alphabet capital ‘O’ & number zero): This option disables all types of optimizations. It is recommended to use this in early stages of development, until we know that our application is working correctly. This is not the default level of optimization hence it has to be specified if we desire it.
- Option ‘–O1’ (alphabet capital ‘O’ & number one): This option optimizes for speed bearing in mind the size of code. Suitable for very large code size where in the focus is not on performing iterations (loops). The high level optimizations it performs are as follows:
- Disables software pipelining
- Disables loop unrolling
- Global code scheduling
- Enables optimization for server applications (straight line and branch like with not too many branches).
- Option ‘–O2’ (alphabet capital ‘O’ & number two): This is the default level of optimization (and also the recommended level in most case). It creates the fastest code in most cases but could increase the executable code size. It is suitable for typical integer applications that do not use a lot of floating point math. The high level optimizations it performs are as follows:
- Global code scheduling
- Software pipelining
- Predication
- Control Speculation
- Dead code elimination and Dead-store elimination
- Loop unrolling
- Partial Redundancy elimination
- Exception handling optimizations
- Structure alignment lowering and optimizations
- Option ‘–O3’ (alphabet capital ‘O’ & number three): Enables all “-O2” optimization as well as more optimizations suitable for loop intensive code (a lot of iterations) that does a lot of floating point arithmetic on large data sets. Better performance that the “-O2” option is not guaranteed unless there are a lot of iterations in the code and large data sets are involved with a lot of floating point arithmetic. The high level optimizations it performs are as follows:
- All the optimization done by the “-O2” option.
- Data pre-fetching
- Loop and memory access transformation
- Scalar replacement
Note: The default level of optimization is “-O2” If any other level of optimization is desired it needs to be explicitly specified. However if the program is built for debugging purposes with the “-g” option specified; then the optimization level is internally set to “-O0”. In this session we have taken a look at some of the basic optimization possible with the Intel compiler on Redwood. This will be continued in the next news letter with tips on more advanced levels of optimizations.
Further Reference: All the information in this article is from the intel compiler documentation available under /opt/cmplrs/8.0.066/doc ( for C/C++)
and /opt/cmplrs/8.0.046/doc (for FORTRAN)on redwood.
|