The performance of SSS-CORE has been evaluated from various angles. Here you can see summarized results of the evaluation such as:
The details of the experiments and the discussions are given in our papers.
In the following, the word `SPARCstation 20' stands for Sun Microsystems SPARCstation 20 and its compatible machines. We have mainly used Axil 320 model 8.1.1, which is compatible with Sun Microsystems SPARCstation 20.
Conditions --------------------------------------------------------- | workstation | SPARCstation 20 (85 MHz SuperSPARC × 1) | |-------------|-----------------------------------------| | | SSS-CORE Ver. 1.1 | | OS |-----------------------------------------| | | SunOS 4.1.4 | ---------------------------------------------------------
Cost of getting a task ID ------------------------------------- | SSS-CORE get_taskid() | 1.12 µsec | |-----------------------|-----------| | SunOS getpid() | 4.39 µsec | -------------------------------------
Costs of allocating/freeing memory (in µsec) ---------------------------------------------------------------- | size (byte) | 4 K 16 K 64 K 256 K 1 M | |-------------------+------------------------------------------| | SSS-CORE allocate | 23.91 28.91 48.77 123.2 431.2 | | SSS-CORE free | 19.49 20.36 23.91 36.23 99.06 | | SunOS sbrk() | 133.2 375.8 894.3 1828 2020 | ----------------------------------------------------------------
Conditions ------------------------------------------------------------------------------ | workstation |Sun Microsystems Ultra 60 (450 MHz UltraSPARC-II × 1)| |----------------------|-----------------------------------------------------| | NIC |Sun Microsystems GigabitEthernet/P 2.0 Adapter | |----------------------|-----------------------------------------------------| | network |(directly connected) | |----------------------|-----------------------------------------------------| |OS & |SSS-CORE Ver. 2.3 & MBCF | |Communication Protocol|-----------------------------------------------------| | |Solaris 2.6 & TCP/IP | ------------------------------------------------------------------------------
One-way latencies of MBCF/1000BASE-SX (in µsec) --------------------------------------------------------- | data size (byte) | 4 16 64 256 1024 | |------------------+------------------------------------| | MBCF | 9.6 11.0 11.5 16.2 35.9 | | TCP/IP | 95.08 95.22 95.39 99.45 114.15 | ---------------------------------------------------------
Peak bandwidths of MBCF/1000BASE-SX (in Mbyte/sec) ------------------------------------------------------------- | data size (byte) | 4 16 64 256 1024 1408 | |------------------+----------------------------------------| | MBCF | 2.29 5.67 22.30 55.41 78.22 80.92 | | TCP/IP | 0.09 0.43 1.67 5.56 12.79 20.21 | -------------------------------------------------------------
Although the software overhead of MBCF is small enough, the peak bandwidth does not come up to the hardware limit of 125 Mbyte/sec. There should be some bottleneck around the Ultra 60's hardware.
Conditions -------------------------------------------------------------------------- | workstation | SPARCstation 20 (85 MHz SuperSPARC × 1) | |-------------|----------------------------------------------------------| | NIC | Sun Microsystems Fast Ethernet SBus Adapter 2.0 | |-------------|----------------------------------------------------------| | | SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) | | network |----------------------------------------------------------| | | Bay Networks BayStack 350T (switching 100BASE-TX HUB) | |-------------|----------------------------------------------------------| | OS | SSS-CORE Ver. 1.1 | --------------------------------------------------------------------------
One-way latencies of MBCF/100BASE-TX (in µsec) ---------------------------------------------------- | data size (byte) | 4 16 64 256 1024 | |------------------+-------------------------------| | MBCF_WRITE | 24.5 27.5 34 60.5 172 | | MBCF_FIFO | 32 32 40.5 73 210.5 | | MBCF_SIGNAL | 49 52.5 60.5 93 227.5 | ----------------------------------------------------
Peak bandwidths of MBCF/100BASE-TX (in Mbyte/sec) ------------------------------------------------------------------ | data size (byte) | 4 16 64 256 1024 1408 | |-------------------------+--------------------------------------| | MBCF_WRITE, half duplex | 0.31 1.15 4.31 8.56 11.13 11.48 | | MBCF_WRITE, full duplex | 0.34 1.27 4.82 9.63 11.64 11.93 | ------------------------------------------------------------------
Conditions ------------------------------------------------------------------------------- | workstation | SPARCstation 20 (85 MHz SuperSPARC × 1) | |------------------|----------------------------------------------------------| | NIC | Sun Microsystems Fast Ethernet SBus Adapter 2.0 | |------------------|----------------------------------------------------------| | | SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) | | network |----------------------------------------------------------| | | Bay Networks BayStack 350T (switching 100BASE-TX HUB) | |------------------|----------------------------------------------------------| |OS & | SSS-CORE Ver. 1.1 & MPI/MBCF | |MPI implementation|----------------------------------------------------------| | | SunOS 4.1.4 & MPICH Ver. 1.1 (using TCP) | -------------------------------------------------------------------------------
Round-trip times of MPI with 100BASE-TX (in µsec) ---------------------------------------------------------------- | message size (byte) | 0 4 16 64 256 1024 4096 | |----------------------+---------------------------------------| | MPI/MBCF on SSS-CORE | 71 85 85 106 168 438 1026 | | MPICH/TCP on SunOS | 968 962 980 1020 1080 1255 2195 | ----------------------------------------------------------------
Peak bandwidths of MPI with 100BASE-TX (in Mbyte/sec) ------------------------------------------------------------------------------ | message size (byte) | 4 16 64 256 1024 4096 16384 65536 | |-----------------------+----------------------------------------------------| | MPI/MBCF on SSS-CORE, | 0.14 0.53 1.82 4.72 8.08 9.72 10.15 9.78 | | half duplex | | | MPI/MBCF on SSS-CORE, | 0.14 0.57 1.90 5.33 10.22 11.68 11.77 11.85 | | full duplex | | | MPICH/TCP on SunOS, | 0.02 0.09 0.35 1.27 3.54 6.04 5.59 7.00 | | half duplex | | ------------------------------------------------------------------------------
Conditions ------------------------------------------------------------------------------- | workstation | SPARCstation 20 (85 MHz SuperSPARC × 1) | |------------------|----------------------------------------------------------| | NIC | Sun Microsystems Fast Ethernet SBus Adapter 2.0 | |------------------|----------------------------------------------------------| | network | SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) | |------------------|----------------------------------------------------------| |OS & | SSS-CORE Ver. 1.1 & MPI/MBCF | |MPI implementation|----------------------------------------------------------| | | SunOS 4.1.4 & MPICH Ver. 1.1 (using TCP) | -------------------------------------------------------------------------------
Execution results of the NAS Parallel Benchmarks ------------------------------------------------------------------------------- | program [# of nodes] | EP[8] MG[8] CG[8] IS[8] LU[8] SP[9] BT[9] | |-----------------------------------------------------------------------------| | MPI/MBCF on SSS-CORE | |-----------------------------------------------------------------------------| | execution time (sec) | 15.14 7.48 11.02 3.02 160.36 154.91 67.30 | | speedup ratio to 1 node | 7.99 5.24 6.27 3.33 6.26 8.11 9.16 | |communication frequency | 0.00 9.68 12.69 13.58 1.89 7.83 5.32 | | (Mbyte/sec)| | |communication frequency | 4 4670 2138 466 1199 421 488 | | (# of messages/sec)| | |average message size | 0.00 2.07 5.94 29.14 1.58 18.60 10.90 | | (Kbyte)| | |MBCF_WRITE | 51.10 0.01 53.33 99.22 13.37 49.01 47.24 | | availability rate (%)| | |use of | yes no no yes no no no | | collective communication| | |-----------------------------------------------------------------------------| | MPICH/TCP on SunOS | |-----------------------------------------------------------------------------| | execution time (sec) | 16.25 13.72 14.59 4.81 185.04 231.66 96.02 | | speedup ratio to 1 node | 7.73 2.83 4.71 2.13 5.84 6.01 6.53 | |-----------------------------------------------------------------------------| | MPI/MBCF on SSS-CORE versus MPICH/TCP on SunOS | |-----------------------------------------------------------------------------| |performance improvement | 1.07 1.83 1.32 1.59 1.15 1.50 1.43 | | ratio| | -------------------------------------------------------------------------------
Conditions ------------------------------------------------------------------------------- | workstation | SPARCstation 20 (85 MHz SuperSPARC × 1) | |------------------|----------------------------------------------------------| | NIC | Sun Microsystems Fast Ethernet SBus Adapter 2.0 | |------------------|----------------------------------------------------------| | network | SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) | |------------------|----------------------------------------------------------| |OS & | SSS-CORE Ver. 1.1 & modified SUNRPC 4.0 | |RPC implementation|----------------------------------------------------------| | | SunOS 4.1.4 & SUNRPC 4.0 | -------------------------------------------------------------------------------
Round-trip latencies of RPC with 100BASE-TX (in µsec) ----------------------------------------------- | data size (byte) | 4 256 512 1024 | |-----------------------+---------------------| | SSS-CORE, MBCF_SIGNAL | 127 173 221 315 | | SSS-CORE, MBCF_FIFO | 148 194 251 372 | | SunOS TCP | 863 903 918 1033 | -----------------------------------------------
Conditions -------------------------------------------------------------------------- | workstation | SPARCstation 20 (85 MHz SuperSPARC × 1) | |----------------|-------------------------------------------------------| | NIC | Sun Microsystems Fast Ethernet SBus Adapter 2.0 | |----------------|-------------------------------------------------------| | network | Bay Networks BayStack 350T (switching 100BASE-TX HUB) | |----------------|-------------------------------------------------------| | OS | SSS-CORE Ver. 1.1 | |----------------|-------------------------------------------------------| | runtime system | ADSM | --------------------------------------------------------------------------
Effects of optimization methods on LU-Contig (n = 512, b = 16) ------------------------------------------------------------------------------ | optimization methods |execution |# of consistency|# of |amount of | | |time (sec)| management| packets|communication| | | | codes| | (Mbyte)| |-------------------------+--------------------------------------------------| | None | 28.20 | 5592 K | 5207 K | 47.73 | |runtime packet combining | 14.35 | 5592 K | 83.5 K | 113.00 | |static interprocedural | 2.17 | 1.43 K | 7.73 K | 9.42 | | redundancy elimination| | | | | |runtime packet combining | 2.16 | 1.43 K | 7.60 K | 9.27 | | & static interprocedural| | | | | | redundancy elimination| | | | | ------------------------------------------------------------------------------
Effects of optimization methods on Radix (#key = 1 M) ------------------------------------------------------------------------------ | optimization methods |execution |# of consistency|# of |amount of | | |time (sec)| management| packets|communication| | | | codes| | (Mbyte)| |-------------------------+--------------------------------------------------| | None | 21.90 | 793 K | 3220 K | 76.72 | |runtime packet combining | 12.13 | 793 K | 75.8 K | 101.08 | |static interprocedural | 1.57 | 2.08 K | 19.5 K | 13.47 | | redundancy elimination| | | | | |runtime packet combining | 1.24 | 2.08 K | 10.1 K | 13.63 | | & static interprocedural| | | | | | redundancy elimination| | | | | ------------------------------------------------------------------------------
Conditions ------------------------------------------------------------------------------- | | workstation | SPARCstation 20 (85 MHz SuperSPARC × 1) | | |----------------|-------------------------------------------------| | | NIC | Sun Microsystems Fast Ethernet SBus Adapter 2.0 | | |----------------|-------------------------------------------------| | SSS-CORE | network | Bay Networks BayStack 350T | | system | | (switching 100BASE-TX HUB) | | |----------------|-------------------------------------------------| | | OS | SSS-CORE Ver. 1.1 | | |----------------|-------------------------------------------------| | | runtime system | UDSM | |-----------------------------------------------------------------------------| | | MPP | Fujitsu AP1000+ (50 MHz SuperSPARC × 256) | | AP1000+ |----------------|-------------------------------------------------| | system | OS | Cell-OS | | |----------------|-------------------------------------------------| | | runtime system | UDSM | -------------------------------------------------------------------------------
Breakdown of execution time -------------------------------------------------- | Sync | synchronization | |------|-----------------------------------------| | WC | write commitment | |------|-----------------------------------------| | PF | page fault handler | |------|-----------------------------------------| | Msg | remote message handlers | |------|-----------------------------------------| | Task | execution of original application codes | --------------------------------------------------