Performance of SSS-CORE
The performance of SSS-CORE
has been evaluated from various angles.
Here you can see summarized results of the evaluation such as:
The details of the experiments and the discussions are given in
our papers.
In the following, the word `SPARCstation 20' stands for
Sun Microsystems SPARCstation 20
and its compatible machines.
We have mainly used Axil 320 model 8.1.1, which is compatible with
Sun Microsystems SPARCstation 20.
Conditions
workstation |
SPARCstation 20 (85 MHz SuperSPARC × 1) |
OS |
SSS-CORE Ver. 1.1 |
SunOS 4.1.4 |
Cost of getting a task ID
SSS-CORE get_taskid() |
1.12 µsec |
SunOS getpid() |
4.39 µsec |
Costs of allocating/freeing memory (in µsec)
size (byte) |
4 K | 16 K | 64 K | 256 K | 1 M |
SSS-CORE allocate |
23.91 | 28.91 | 48.77 | 123.2 | 431.2 |
SSS-CORE free |
19.49 | 20.36 | 23.91 | 36.23 | 99.06 |
SunOS sbrk() |
133.2 | 375.8 | 894.3 | 1828 | 2020 |
Conditions
workstation |
Sun Microsystems Ultra 60 (450 MHz UltraSPARC-II × 1) |
NIC |
Sun Microsystems GigabitEthernet/P 2.0 Adapter |
network |
(directly connected) |
OS & Communication Protocol |
SSS-CORE Ver. 2.3 | MBCF |
Solaris 2.6 | TCP/IP |
One-way latencies of MBCF/1000BASE-SX (in µsec)
data size (byte) |
4 | 16 | 64 | 256 | 1024 |
MBCF |
9.6 | 11.0 | 11.5 | 16.2 | 35.9 |
TCP/IP |
95.08 | 95.22 | 95.39 | 99.45 | 114.15 |
Peak bandwidths of MBCF/1000BASE-SX (in Mbyte/sec)
data size (byte) |
4 | 16 | 64 | 256 | 1024 | 1408 |
MBCF |
2.29 | 5.67 | 22.30 | 55.41 | 78.22 | 80.92 |
TCP/IP |
0.09 | 0.43 | 1.67 | 5.56 | 12.79 | 20.21 |
Although the software overhead of MBCF is small enough, the peak
bandwidth does not come up to the hardware limit of 125 Mbyte/sec.
There should be some bottleneck around the Ultra 60's hardware.
Conditions
workstation |
SPARCstation 20 (85 MHz SuperSPARC × 1) |
NIC |
Sun Microsystems Fast Ethernet SBus Adapter 2.0 |
network |
SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) |
Bay Networks BayStack 350T (switching 100BASE-TX HUB) |
OS |
SSS-CORE Ver. 1.1 |
One-way latencies of MBCF/100BASE-TX (in µsec)
data size (byte) |
4 | 16 | 64 | 256 | 1024 |
MBCF_WRITE |
24.5 | 27.5 | 34 | 60.5 | 172 |
MBCF_FIFO |
32 | 32 | 40.5 | 73 | 210.5 |
MBCF_SIGNAL |
49 | 52.5 | 60.5 | 93 | 227.5 |
Peak bandwidths of MBCF/100BASE-TX (in Mbyte/sec)
data size (byte) |
4 | 16 | 64 | 256 | 1024 | 1408 |
MBCF_WRITE, half duplex |
0.31 | 1.15 | 4.31 | 8.56 | 11.13 | 11.48 |
MBCF_WRITE, full duplex |
0.34 | 1.27 | 4.82 | 9.63 | 11.64 | 11.93 |
Conditions
workstation |
SPARCstation 20 (85 MHz SuperSPARC × 1) |
NIC |
Sun Microsystems Fast Ethernet SBus Adapter 2.0 |
network |
SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) |
Bay Networks BayStack 350T (switching 100BASE-TX HUB) |
OS & MPI implementation |
SSS-CORE Ver. 1.1 | MPI/MBCF |
SunOS 4.1.4 | MPICH Ver. 1.1 (using TCP) |
Round-trip times of MPI with 100BASE-TX (in µsec)
message size (byte) |
0 | 4 | 16 | 64 | 256 | 1024 | 4096 |
MPI/MBCF on SSS-CORE |
71 | 85 | 85 | 106 | 168 | 438 | 1026 |
MPICH/TCP on SunOS |
968 | 962 | 980 | 1020 | 1080 | 1255 | 2195 |
Peak bandwidths of MPI with 100BASE-TX (in Mbyte/sec)
message size (byte) |
4 | 16 | 64 | 256 | 1024 | 4096 | 16384 | 65536 |
MPI/MBCF on SSS-CORE, half duplex |
0.14 | 0.53 | 1.82 | 4.72 | 8.08 | 9.72 | 10.15 | 9.78 |
MPI/MBCF on SSS-CORE, full duplex |
0.14 | 0.57 | 1.90 | 5.33 | 10.22 | 11.68 | 11.77 | 11.85 |
MPICH/TCP on SunOS, half duplex |
0.02 | 0.09 | 0.35 | 1.27 | 3.54 | 6.04 | 5.59 | 7.00 |
Conditions
workstation |
SPARCstation 20 (85 MHz SuperSPARC × 1) |
NIC |
Sun Microsystems Fast Ethernet SBus Adapter 2.0 |
network |
SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) |
OS & MPI implementation |
SSS-CORE Ver. 1.1 | MPI/MBCF |
SunOS 4.1.4 | MPICH Ver. 1.1 (using TCP) |
Execution results of the NAS Parallel Benchmarks
program [# of nodes] |
EP [8] | MG [8] | CG [8] | IS [8] | LU [8] | SP [9] | BT [9] |
MPI/MBCF on SSS-CORE |
execution time (sec) |
15.14 | 7.48 | 11.02 | 3.02 | 160.36 | 154.91 | 67.30 |
speedup ratio to 1 node |
7.99 | 5.24 | 6.27 | 3.33 | 6.26 | 8.11 | 9.16 |
communication frequency (Mbyte/sec) |
0.00 | 9.68 | 12.69 | 13.58 | 1.89 | 7.83 | 5.32 |
communication frequency (# of messages/sec) |
4 | 4670 | 2138 | 466 | 1199 | 421 | 488 |
average message size (Kbyte) |
0.00 | 2.07 | 5.94 | 29.14 | 1.58 | 18.60 | 10.90 |
MBCF_WRITE availability rate (%) |
51.10 | 0.01 | 53.33 | 99.22 | 13.37 | 49.01 | 47.24 |
use of collective communication |
yes | no | no | yes | no | no | no |
MPICH/TCP on SunOS |
execution time (sec) |
16.25 | 13.72 | 14.59 | 4.81 | 185.04 | 231.66 | 96.02 |
speedup ratio to 1 node |
7.73 | 2.83 | 4.71 | 2.13 | 5.84 | 6.01 | 6.53 |
MPI/MBCF on SSS-CORE versus MPICH/TCP on SunOS |
performance improvement ratio |
1.07 | 1.83 | 1.32 | 1.59 | 1.15 | 1.50 | 1.43 |
Conditions
workstation |
SPARCstation 20 (85 MHz SuperSPARC × 1) |
NIC |
Sun Microsystems Fast Ethernet SBus Adapter 2.0 |
network |
SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) |
OS & RPC implementation |
SSS-CORE Ver. 1.1 | modified SUNRPC 4.0 |
SunOS 4.1.4 | SUNRPC 4.0 |
Round-trip latencies of RPC with 100BASE-TX (in µsec)
data size (byte) |
4 | 256 | 512 | 1024 |
SSS-CORE, MBCF_SIGNAL |
127 | 173 | 221 | 315 |
SSS-CORE, MBCF_FIFO |
148 | 194 | 251 | 372 |
SunOS TCP |
863 | 903 | 918 | 1033 |
Conditions
workstation |
SPARCstation 20 (85 MHz SuperSPARC × 1) |
NIC |
Sun Microsystems Fast Ethernet SBus Adapter 2.0 |
network |
Bay Networks BayStack 350T (switching 100BASE-TX HUB) |
OS |
SSS-CORE Ver. 1.1 |
runtime system |
ADSM |
Effects of optimization methods on LU-Contig (n = 512, b = 16)
optimization methods |
execution time (sec) | # of consistency management codes | # of packets | amount of communication (Mbyte) |
None |
28.20 | 5592 K | 5207 K | 47.73 |
runtime packet combining |
14.35 | 5592 K | 83.5 K | 113.00 |
static interprocedural redundancy elimination |
2.17 | 1.43 K | 7.73 K | 9.42 |
runtime packet combining & static interprocedural redundancy elimination |
2.16 | 1.43 K | 7.60 K | 9.27 |
Effects of optimization methods on Radix (#key = 1 M)
optimization methods |
execution time (sec) | # of consistency management codes | # of packets | amount of communication (Mbyte) |
None |
21.90 | 793 K | 3220 K | 76.72 |
runtime packet combining |
12.13 | 793 K | 75.8 K | 101.08 |
static interprocedural redundancy elimination |
1.57 | 2.08 K | 19.5 K | 13.47 |
runtime packet combining & static interprocedural redundancy elimination |
1.24 | 2.08 K | 10.1 K | 13.63 |
Figure: Speedups on ADSM
Conditions
SSS-CORE system |
workstation |
SPARCstation 20 (85 MHz SuperSPARC × 1) |
NIC |
Sun Microsystems Fast Ethernet SBus Adapter 2.0 |
network |
Bay Networks BayStack 350T (switching 100BASE-TX HUB) |
OS |
SSS-CORE Ver. 1.1 |
runtime system |
UDSM |
AP1000+ system |
MPP |
Fujitsu AP1000+ (50 MHz SuperSPARC × 256) |
OS |
Cell-OS |
runtime system |
UDSM |
Breakdown of execution time
Sync |
synchronization |
WC |
write commitment |
PF |
page fault handler |
Msg |
remote message handlers |
Task |
execution of original application codes |
Figure: Execution time of LU-Contig on 1 to 8 nodes
Figure: Execution time of Radix on 1 to 8 nodes
Figure: Speedups on UDSM (on
SSS-CORE)
To SSS-CORE Home
Page.
Mail to
<info@ssscore.org>.
© 1998-2000 SSS-CORE Project Team.