- 易迪拓培训,专注于微波、射频、天线设计工程师的培养
HFSS15: Recommended Practices for SGE Clusters
The following subsections contain recommendations on how to set up an SGE cluster for efficiently running Ansoft serial and parallel jobs. These recommendations require the cluster administrator to make configuration changes.
Submitting Exclusive Jobs
Consumable Memory Limits
Serial Jobs in SGE
Parallel Jobs in SGE
Using Multithreading with Parallel Jobs
Submitting Exclusive Jobs
In many cases, clusters are used to run "large" Ansoft batch jobs. That is, these are jobs that may require a large quantity of resources, such as processors, memory, disk space, or run time. One way to ensure that the resources needed by the batch job are available to the job is to run the job in an "exclusive" mode. That is, any host running the job is not available for use by any other jobs. There is no SGE built in mechanism for specifying that a job is "exclusive". SGE is extensible, and it is not difficult to configure the cluster to allow exclusive jobs. The steps below show one way to do this. This example requires SGE 6.2u3 or later. Note that specifying a job as "exclusive" may delay the start of the job if there are not enough hosts available to run the job exclusively.
1. Use the command qconf -mcto add a new complex to the table of complexes. Recommended attributes are:
name : exclusive
shortcut : excl
type : BOOL
relop : EXCL
requestable : YES
consumable : YES
default : 0
urgency : 0
2. Set the value of "exclusive" to TRUE for each execution host using the command qconf -me hostname, where hostname is the name of the host. The values of all host configuration parameters may be displayed using the command qconf -se hostname. The "complex_values" line should look similar to:
complex_values exclusive=TRUE, but other values may also be included.
3. When submitting a job, the job will be "exclusive" if the value "excl" is included in the resource list specified by the qsub -l option. If the resource list does not include "excl" then the job will not be exclusive, and other jobs may run on the same host or hosts as this job.
4. Example qsub command line for exclusive serial job:
qsub -b y -l excl /opt/Ansoft/HFSS14.0/hfss14.0/hfss -ng -BatchSolve ~/projects/OptimTee.hfss.
Although serial jobs use only one slot, no other jobs will run on the host where this job is running, even if additional slots are present.
5. Example qsub command line for exclusive parallel job using eight engines, each using a single thread of execution:
qsub -b y -l excl -pe pe1 8 /opt/Ansoft/HFSS14.0/hfss14.0/hfss -ng -BatchSolve -Distributed -machinelist num=8 ~/projects/OptimTee.hfss
None of the hosts used for this job will be allowed to run other jobs while this job is running.
Consumable Memory Limits
SGE contains several built-in complexes related to memory, including mem_total, for example, but none of these are "consumable". If a job is submitted with resource list including one of these non-consumable memory complexes (such as mem_total), then the job will run on a host or hosts only if sufficient memory is available. If a second job is submitted, the memory request for the second job is compared to the original total when determining if the job may run on a host. This may result in both jobs running out of memory. For example, if host A has mem_total=16G of memory, and two jobs are submitting with option "-l mt=16G", then both jobs could run on host A, if sufficient slots are available on host A.
SGE allows complexes to be "consumable" to avoid this type of problem. If a complex is consumable and a job requests x amount of the complex in the -l resource list, then the available amount of the resource is decreased by x for subsequent jobs. For the same example as above, if the mem_total complex was consumable, then the first job would run on host A. This would decrease the available mem_total from 16G to 16G-16G = 0. The second job could not run on host A because there is no memory available for this job.
The steps below show how to set up a consumable resource called physical_memory to accomplish the same thing. We do not recommend changing the behavior of the built-in complexes (such as mem_total) because other scripts may expect normal behavior of the built-in complexes.
1. Use the command qconf -mcto add a new complex to the table of complexes. Recommended attributes are:
name : physical_memory
shortcut : phys_mem
type : MEMORY
relop : <=
requestable : YES
consumable : YES
default : 0
urgency : 0
2. Set the value of "physical_memory" to an appropriate value for each execution host using the command qconf -me hostname, where hostnameis the name of the host. The appropriate value is the actual physical memory on each host. Because the type is MEMORY, the K, M, and G suffixes may be used to represent kilobytes, megabytes and gigabytes. The values of all host configuration parameters may be displayed using the command qconf -se hostname. The "complex_values" line should look similar to:
complex_values physical_memory=16G,
but other values may also be included, and the memory value should be appropriate for the host.
3. When submitting a job, the physical memory requirement per slot may be specified in the resource list as follows: -l phys_mem=mem_needed. The number of slots assigned to the job on a specific host will be limited by the number of slots available on the host, and also by the physical_memory available on the host.
Serial Jobs in SGE
If a serial job is submitted with the option -l phys_mem=mem_neededincluded, then the job may only run on a host in which the remaining physical_memory is equal to or greater than the mem_neededvalue.
Example 1: Host A has physical_memory=16G, and host B has physical_memory=8G. If mem_neededis 8G, the job may run on either host A or host B. If mem_neededis 16G, then the job may only run on host A.
Example 2: Host A has physical_memory=16G, and host B has physical_memory=8G. Job 1 is already running on host A, and it was submitted with option -l phys_mem=8G. If job 2 is submitted with option -l phys_mem=16G, then job 2 cannot start until job 1 finishes, because only host A has 16GB of physical_memory. If job 2 is submitted with option -l phys_mem=8G, then job 2 may start immediately, and run on either host A or host B, because both hosts have 8G of physical_memory remaining.
Parallel Jobs in SGE
Because the consumable setting for physical_memory is YES (and not JOB), each slot of the job requires a physical_memory of mem_needed. The number of slots on a host assigned to the job is limited by the number of available slots on the host. It is also limited by the physical_memory available on the host; the number of slots assigned to the job cannot exceed the available physical_memory on the host divided by the mem_neededspecification.
Example 1: Execution host A and execution host B both have 4 slots per host (configured in the queue associated with the parallel environment). Host A has physical_memory=16G and host B has physical_memory=8G (shown by commands qconf -se Aand qconf -se B). If a a job is submitted that requires 6 slots and 4G per slot, it will be able to run, with 4 slots on host A and 2 slots on host B. The qsub command might look like: qsub -l phys_mem=4G -pe pe_name 6 command args
Example 2: Same as example 1, except that 7 slots are requested. In this case, the job will never run. Although there are 8 slots available on hosts A and B, only two of the slots on host B are usable by this job because it only has physical_memory of 8G. With only 6 slots total available to this job (4 on host A and 2 on host B), the job can not start. In this case the command might look like: qsub -l phys_mem=4G -pe pe_name 7 command args
Using Multithreading with Parallel Jobs
For large jobs it may be useful to combine multiprocessing with distributed processing. Distributed processing refers to starting multiple processes, in which each process performs a portion of the analysis. These processes may run on the same host or on different hosts. The number of processes running at the same time is known as the number of "analysis engines". Multiprocessing refers to using multiple threads within a single process to decrease the run time of the process. Multiprocessing may also be called multi-threaded processing.
As a concrete example of combining multiprocessing with distributed processing, an analysis could run with four engines, where each engine uses two threads. In order to distribute the processing load so that no processor is overloaded, one slot is generally allocated per thread, so 8 slots would be needed for this example (4 engines * 2 threads per engine = 8 threads). The four engines could all run on a single host, or they could be distributed across 2, 3 or 4 hosts, depending on available slots. Each engine represents a single process, so the two slots for each engine must be allocated on the same host.
This section describes how to set up an SGE cluster so that a specified number of slots per host may be requested when a job is submitted. This procedure will require the cluster administrator privileges. This capability may be used to submit parallel jobs in which one engine runs on each host, and the number of slots per host matches the number of threads used by each engine.
1. Let n be the largest number of slots available on any host used for the jobs. Create a separate parallel environment for each value of the number of slots per host from 1 to n. For example, pe_sph1 is a parallel environment in which one slot is allocated to the job per host, pe_sph2 is a parallel environment in which two slots are allocated to the job per host, etc. The command qconf -ap pe_namemay be used to create each new parallel environment. The allocation_rule parameter should be set to the number of slots per host, an integer from 1 to n. The control_slaves parameter should be set to TRUE, as described above. The slots parameter should be set to the maximum number of slots managed by this parallel_environment, which is typically set to a large number, such as 999. The other parameters should be set to values appropriate for the cluster. For example, the pe_sph2 parallel environment might have the following parameters:
pe_name : pe_sph2
slots : 999
user_lists : NONE
xuser_lists : NONE
start_proc_args : /bin/true
stop_proc_args : /bin/true
allocation_rule : 2
control_slaves : TRUE
job_is_first_task : FALSE
urgency_slots : min
accounting_summary : TRUE
2. When submitting a job, use the parallel environment where the slots per host matches the number of threads per engine.
The batchoptions setting for 'HFSS/Preferences/NumberOfProcessorsDistributed' controls the number of threads per distributed engine. This should be set to match the number of slots per host. With any analysis, a portion of the analysis may not be distributed across multiple engines. Multiprocessing may be used with this portion of the analysis using the batchoptions setting for 'HFSS/Preferences/NumberOfProcessors'. This value should also be set to match the number of slots per host because this portion of the analysis will run on one of the hosts allocated to the job.
Example qsub command line for running distributed processing with four engines and multiprocessing with two threads per engine:
qsub -V -b y -pe pe_sph2 8 "/opt/Ansoft/HFSS14.0/hfss14.0/hfss -ng -BatchSolve -Distributed -machinelist num=4 -batchoptions """'HFSS/Preferences/NumberOfProcessorsDistributed'=2 'HFSS/Preferences/NumberOfProcessors'=2""" ~/projects/OptimTee.hfss"
The -Voption indicates that the all environment variables in the submission environment should be copied to the job environment.
The -b y option indicates that hfss is launched directly from the command line, instead of using a script.
The -pe sph2 8 command_lineoption indicates that this is a parallel job running under the pe_sph2parallel environment so that two slots are allocated to this job from each host, and that 8 slots in total are allocated to this parallel job.
The -Distributedoption indicates that this is a DSO job, so that multiple engines will be started.
The -machinelist num=4option indicates that a total of four engines will be started.
The 'HFSS/Preferences/NumberOfProcessorsDistributed'=2batchoption indicates that the distributed analysis engines should use two cores for multi-processing.
The 'HFSS/Preferences/NumberOfProcessors'=2batchoption indicates that the portion of the analysis that is not distributed should use two cores for multi-processing.
The entire hfss command is in double quotes, and the double quotes enclosing the -batchoptions value are escaped. Each of these double quotes is replaced by the sequence """.