首页 > FEKO天线仿真 > FEKO仿真讨论 > 曙光4000A 在矩量法可以算多少个三角元

曙光4000A 在矩量法可以算多少个三角元

录入：edatop.com 点击：

曙光4000A
系统峰值 10.2Tflops
计算结点 512个4路AMD Opteron处理器
存储结点 16个4路AMD Opteron处理器
接入结点 4个4路AMD Opteron处理器
CPU AMD OPTERON 850, 2.4GHz，总共2128个 CPU
系统内存总容量 4256GB
磁盘总容量 20TB
体系架构 Cluster、Myrinet 2000
操作系统 Turbo Linux 8.0
算巨量法MOM，只能算52万个网格。

汗
这么高的配置，才52w
刚才看到一篇报道说美国算隐身时高阶MOM的话能算到很高了
看来中国的电磁学还要大快步前进阿

这样算并不客观，您要看他的授权是买多少CPU的版本。
当您使用平行处理时，总记忆体量需除以 CPU - Core 的量。
EX. 总模型记忆体需求是48GB好了(CPU - 8Core)，
他平行时显示1Core需6GB记忆体。
所以以曙光4000A的配备假使授权够的话应该不止52万个网格
关于记忆体跟执行时间的计算规则：
Estimating Resource Requirements for Simulations
http://www.feko.info/knowledge-base/quarterly/feko-quarterly-archive/FEKO_Quarterly_December_2007.pdf (Page03)
It is often important that users are able to estimate
reasonably accurately how much resources will be
required for a FEKO simulation. FEKO support
engineers have developed rules of thumb that may
be used for such estimates and maintain a page in
the FEKO website help centre with the latest information
on these techniques. Here are some of the
most useful techniques.
MoM
The number of unknowns for the MoM (Nmom) are
roughly 1.5 times the number of triangles (T) in the
model. Dielectric triangles are the exception to the
rule because each triangle require an electrical
and magnetic basis function, which means that
dielectric triangles contributes roughly 3 unknowns
per triangle:
RAM = Nmom*(Nmom+1)*p/(1024^2)
(MByte)
where p is 8 when single precision storage is used
and 16 when double precision storage is used. For
large problems the mathematical operations required
to analyse the model are roughly:
OPS = (8/3) x Nmom^3 (FLOPS)
Runtime may thus be estimated in seconds by
dividing OPS by the processor speed in FLOPS.
(Modern CPUs feature 5 to 8 GFLOPS.)
........

我假设一个cpu线程授权，运算时可以使用全部内存。采用feko的MOM法。52万个网格内存需要4000多G。
这些就是根据feko提供的资源公式计算的，顺便说一下，采用单精度。
当feko计算问题的时候，对于同一个模型，每个线程需求的内存是一定的，如果选用多线程，那么就是n倍的内存占用了。
但是线程数过多，就会造成并行效率低下。

52万个网格?好大啊，我好像没计算过这么大的MOM。

1. 了解.. 关于这部分我没仔细算过..
的确 MOM... 消耗记忆体真的量还蛮惊人的，假使没 MLFMM 的出现，那 MOM 的发展就应该会受很大的限制。
2. 关于曙光4000A..CPU AMD OPTERON 850、2.4GHz，总共2128个 CPU...这台机器还真是惊人。
最近 Intel 5482 有新出一棵 -3.2GB(12MB Cache) ，不知道您是否知道这两个的效能比，以一个 Core而言。
http://product.pchome.net/diy_cpu_amd_opteron_850/comment_9559.html
产品名称：AMD Opteron 850 参考价格：脱销商家报价：暂无商家报价适用类型：工作站/服务器系列型号：Opteron 插槽类型：Socket 940 主频：2.4GHz 制作工艺：0.13um L2快取（KB）：1MB
http://detail.zol.com.cn/servercpu/index135439.shtml
ntel Xeon 5482 3.2G(散)
参考价格：￥11800 [北京]
商家报价：￥10384 至￥14279
核心数量：四核心
主频（MH：3200MHz
汇流排频率：1600MHz
L2快取（：12MB
3 执行绪数过多，就会造成并行效率低这是一定的，这是 MPI 的问题。可是 FEKO 他有一个平行计算的测试表，当 CPU 数量在 32 Core 时，还有 80% 以上的效能，EMSS宣称有做 GHOST 的机制，提高执行绪数过多降低效率的问题。
http://www.feko.info/feko-product-info/technical/special-module-and-feature-articles/parallel-processing/parallel-processing
Parallel Processing
A short description of FEKO's parallel processing abilities.
Many modern computer systems make use of multiple processing units in order to improve computing performance. Such systems include:
simple multicore CPUs (i.e. one computer with one CPU having multiple cores),
multi-CPU PCs and SMP workstations (symmetric multiprocessor, typically 2 to 8 CPUs),
large massively parallel distributed systems with typically 128 to 1024 CPUs (which can again be multicore).
In order to gain the most benefit from the computational hardware, parallel versions of FEKO support state-of-the-art interconnect technologies like GigE, Myrinet, Infiniband or vendor proprietary interconnects like the SGI NumaFlex technology.
In FEKO all the solution phases for all the various numerical techniques have been parallelised, e.g. the ray-tracing for UTD, the MoM matrix setup and solution, the near- and far-field calculations and also seemingly simple things such as power loss computations.
We are very proud of the parallel efficiency of the MLFMM in FEKO. Even for this mathematically complex technique all the phases of the solution process (near-field matrix setup, aggregation, translation, disaggregation, pre-conditioning, iterative solution etc.) have been parallelised rigorously. The efficiency of the parallel implementation in FEKO is in the order of 80% to 95%, depending on the problem and the solution phase etc. This means that for a system with 32 processors the run-time would be approximately 26 times (0.8*32) faster than on a sequential run, i.e. a single processor.
Total run-time efficiency (all solution phases) for parallel MLFMM solution
of a problem with 3.18 million unknowns.
Intel Cluster Ready
The "Intel Cluster Ready" program facilitates easier design, build and deployment of cluster computers. Developers of cluster computing software (such as FEKO) can validate their software for use on standard Intel cluster environments and be be certified as Independent Software Vendors (ISV) by Intel.
FEKO is dedicated to improving the performance of our software in cluster computing environments and work closely with Intel engineers in this endeavour. As such FEKO was certified as ISV by Intel in 2007 and may proudly brand our software with the Intel Cluster Ready logo. This means that FEKO customers can purchase an Intel Cluster Ready certified computer with the confidence that FEKO has been qualified on this computing environment and will work straight out of the box.
More information on the Intel Cluster Ready initiative is available on the Intel website.