| Home | My Accounts | Newsletter | News Flash | Contact Us | Search |
|
  |
by Taner Pirim Redwood, which is the most recent addition to MCSR's HPC lineup, has been upgraded in March 2005 by adding 128 1.3 GHz Itanium2 processors with 128 GB of memory. MCSR intends for this to be a capability resource for researchers who have particularly high performance codes and calculations that cannot be run (or cannot be run efficiently) on other MCSR supercomputers or clusters. While we will not allow redwood to sit idle while MCSR researchers are wanting for CPU cycles, we will nevertheless give priority to those researchers who can demonstrate the need for the this resource and their ability to efficiently use it. The machine currently operates with 192-processors, 192 GBytes memory, and 8.1 Tbytes of fiber channel disk and the word size of 64 bits. The SGI benchmark suite has performed with 633.35 Gflops, or 91.6% of the theoretical peak and the high performance linpack benchmarks have performed with 602 Gflops or approximately 88% of the theoretical peak. Since the system has become heterogeneous after the addition of 64 processors with different time clocks than the previous 128 processors, a new PBS queue structure has been configured to utilize the usage of this valuable system. Thus, two PBS instances have been created according to the cpuset by virtually dividing the system into two via CPU clock speeds. The first queue structure controls the jobs that are submitted to the old processors while the second queue structure controls the jobs that are submitted to the new processors. By default, every user has access to Red-2, which will run 1-3 processor jobs on the old processors, and to Red-4, which will run 4-processor jobs on the new processors. To run on 4 or more processors, the user needs to request access, and provide justification for needing to run on more than 3, as well as demonstrate an ability to get good parallel efficiency for the jobs. The table below shows the specifications for each queue in Redwood. By using two different queue structures, we are trying to keep a job from getting a mix of processors types, which may waste a portion of the faster processors as they wait for the slower processors to complete their portion of a job. (i.e., we are ensuring that each job will run on a homogenous set of CPUs, even though the larger system itself is heterogeneous.) The cutoff is: 3 or less CPUs, old CPUs and PBS structure. 4 or more CPUS, new CPUs, and PBS structure. The user will use the same qsub command, but should use qstat2, qdel2, and qalter2 to interrogate jobs that were routed to the 2nd PBS instance. For more information, please visit http://www.mcsr.olemiss.edu/computing/pbs.
Some important details about the specific configuration of the queue structures follows as; · 8 of 900 MHz CPUS and their associated memory (about 8GB) have been formed into a boot "cpuset." All interactive jobs of users will be routed automatically to resources in this cpuset (i.e., they will run on these CPUs and their memory). · The other 56 900 MHz CPUs are configured as usual in the first PBS instance, and will run jobs that are routed to the Red-2 queue. · The new 128 1.3 GHz CPUs are configured in the new/2nd PBS instance with dynamic cpusets. This means that jobs running on these CPUs will automatically get access to all of the memory (just under 2GB per 2 processors) that is local to those processors (i.e., that is on the same node.) Therefore, it is no longer necessary to request specific memory amounts from PBS when submitting jobs to Red-4 and above, either via your own PBS scripts or calls to MCSR chemistry job submission scripts such as g03sub. In fact, if you do request a specific memory amount from PBS in these cases, you may do harm to yourself or others. To yourself, if you request/use less than PBS was going to give you anyway; or to others, if you request even a fraction more memory that will be available locally to the processors where PBS schedules your job. For instance, if you request 4 CPUs and 4GB memory, PBS will see that only 3584 MB is available on the two nodes that combine to provide your 4 CPUs, and will have to allocate you memory from another, non-local node. Since one of the features of dynamic cpusets is to never split up a node (a family with 2 CPUs and just under 2GB of available memory), PBS will in this case attempt to secure for your job an entire additional node. Either it will fail, because you are not authorized to run jobs in the next larger queue (say, Red-8, instead of Red-4), or it will succeed, and you'll end up with more than enough memory, plus 2 extra CPUs that you don't need, depriving other researchers of those CPUs for the duration of your job. You may be thinking: since there are 2GB and 2 CPUs in each node, shouldn't I get a full GB of memory per processor? The answer is, not quite, because a few percent of the memory in each node is dedicated to directory memory (for cache coherency) and the kernel uses some memory as well. The table above shows how much local memory is actually available per job, by queue. If you have a job that needs more memory than CPUs, let us know and we'll consider setting up a queue to address your situation. For more information about Redwood, please visit MCSR webpage via http://www.mcsr.olemiss.edu. |
|||||||||||||||||||||||||||||||||||||
