Performance of SSS-CORE
The performance of SSS-CORE
has been evaluated from various angles.
Here you can see summarized results of the evaluation such as:
The details of the experiments and the discussions are given in
our papers.
In the following, the word `SPARCstation 20' stands for
Sun Microsystems SPARCstation 20
and its compatible machines.
We have mainly used Axil 320 model 8.1.1, which is compatible with
Sun Microsystems SPARCstation 20.
  Conditions
  
  
  
    | workstation | 
      SPARCstation 20 (85 MHz SuperSPARC × 1) | 
    | OS | 
      SSS-CORE Ver. 1.1 | 
    
      | SunOS 4.1.4 | 
  
  Cost of getting a task ID
  
  
  
    | SSS-CORE get_taskid() | 
      1.12 µsec | 
    | SunOS getpid() | 
      4.39 µsec | 
  
  Costs of allocating/freeing memory (in µsec)
  
  
  
    | size (byte) | 
      4 K |    16 K |   64 K |   256 K |  1 M | 
  
  
    | SSS-CORE allocate | 
      23.91 |  28.91 |  48.77 |  123.2 |  431.2 | 
    | SSS-CORE free | 
      19.49 |  20.36 |  23.91 |  36.23 |  99.06 | 
    | SunOS sbrk() | 
      133.2 |  375.8 |  894.3 |  1828 |   2020 | 
  
  Conditions
  
  
  
    | workstation | 
      Sun Microsystems Ultra 60 (450 MHz UltraSPARC-II × 1) | 
    | NIC | 
      Sun Microsystems GigabitEthernet/P 2.0 Adapter | 
    | network | 
      (directly connected) | 
    | OS & Communication Protocol | 
      SSS-CORE Ver. 2.3 |  MBCF | 
    
      | Solaris 2.6 |                                  TCP/IP | 
  
  One-way latencies of MBCF/1000BASE-SX (in µsec)
  
  
  
    | data size (byte) | 
      4 |      16 |     64 |     256 |    1024 | 
  
  
    | MBCF | 
      9.6 |    11.0 |   11.5 |   16.2 |   35.9 | 
    | TCP/IP | 
      95.08 |  95.22 |  95.39 |  99.45 |  114.15 | 
  
  Peak bandwidths of MBCF/1000BASE-SX (in Mbyte/sec)
  
  
  
    | data size (byte) | 
      4 |     16 |    64 |     256 |    1024 |   1408 | 
  
  
    | MBCF | 
      2.29 |  5.67 |  22.30 |  55.41 |  78.22 |  80.92 | 
    | TCP/IP | 
      0.09 |  0.43 |  1.67 |   5.56 |   12.79 |  20.21 | 
  
Although the software overhead of MBCF is small enough, the peak
bandwidth does not come up to the hardware limit of 125 Mbyte/sec.
There should be some bottleneck around the Ultra 60's hardware.
  Conditions
  
  
  
    | workstation | 
      SPARCstation 20 (85 MHz SuperSPARC × 1) | 
    | NIC | 
      Sun Microsystems Fast Ethernet SBus Adapter 2.0 | 
    | network | 
      SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) | 
    
      | Bay Networks BayStack 350T (switching 100BASE-TX HUB) | 
    | OS | 
      SSS-CORE Ver. 1.1 | 
  
  One-way latencies of MBCF/100BASE-TX (in µsec)
  
  
  
    | data size (byte) | 
      4 |     16 |    64 |    256 |   1024 | 
  
  
    | MBCF_WRITE | 
      24.5 |  27.5 |  34 |    60.5 |  172 | 
    | MBCF_FIFO | 
      32 |    32 |    40.5 |  73 |    210.5 | 
    | MBCF_SIGNAL | 
      49 |    52.5 |  60.5 |  93 |    227.5 | 
  
  Peak bandwidths of MBCF/100BASE-TX (in Mbyte/sec)
  
  
  
    | data size (byte) | 
      4 |     16 |    64 |    256 |   1024 |   1408 | 
  
  
    | MBCF_WRITE, half duplex | 
      0.31 |  1.15 |  4.31 |  8.56 |  11.13 |  11.48 | 
    | MBCF_WRITE, full duplex | 
      0.34 |  1.27 |  4.82 |  9.63 |  11.64 |  11.93 | 
  
  Conditions
  
  
  
    | workstation | 
      SPARCstation 20 (85 MHz SuperSPARC × 1) | 
    | NIC | 
      Sun Microsystems Fast Ethernet SBus Adapter 2.0 | 
    | network | 
      SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) | 
    
      | Bay Networks BayStack 350T (switching 100BASE-TX HUB) | 
    | OS & MPI implementation | 
      SSS-CORE Ver. 1.1 |  MPI/MBCF | 
    
      | SunOS 4.1.4 |                                  MPICH Ver. 1.1 (using TCP) | 
  
  Round-trip times of MPI with 100BASE-TX (in µsec)
  
  
  
    | message size (byte) | 
      0 |    4 |    16 |   64 |    256 |   1024 |  4096 | 
  
  
    | MPI/MBCF on SSS-CORE | 
      71 |   85 |   85 |   106 |   168 |   438 |   1026 | 
    | MPICH/TCP on SunOS | 
      968 |  962 |  980 |  1020 |  1080 |  1255 |  2195 | 
  
  Peak bandwidths of MPI with 100BASE-TX (in Mbyte/sec)
  
  
  
    | message size (byte) | 
      4 |     16 |    64 |    256 |   1024 |   4096 |   16384 |  65536 | 
  
  
    | MPI/MBCF on SSS-CORE, half duplex | 
      0.14 |  0.53 |  1.82 |  4.72 |  8.08 |   9.72 |   10.15 |  9.78 | 
    | MPI/MBCF on SSS-CORE, full duplex | 
      0.14 |  0.57 |  1.90 |  5.33 |  10.22 |  11.68 |  11.77 |  11.85 | 
    | MPICH/TCP on SunOS, half duplex | 
      0.02 |  0.09 |  0.35 |  1.27 |  3.54 |   6.04 |   5.59 |   7.00 | 
  
  Conditions
  
  
  
    | workstation | 
      SPARCstation 20 (85 MHz SuperSPARC × 1) | 
    | NIC | 
      Sun Microsystems Fast Ethernet SBus Adapter 2.0 | 
    | network | 
      SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) | 
    | OS & MPI implementation | 
      SSS-CORE Ver. 1.1 |  MPI/MBCF | 
    
      | SunOS 4.1.4 |                                  MPICH Ver. 1.1 (using TCP) | 
  
  Execution results of the NAS Parallel Benchmarks
  
  
  
    | program [# of nodes] | 
      EP [8] |  MG [8] |  CG [8] |  IS [8] |  LU [8] |  SP [9] |  BT [9] | 
  
  
    | MPI/MBCF on SSS-CORE | 
    | execution time (sec) | 
      15.14 |   7.48 |    11.02 |   3.02 |    160.36 |  154.91 |  67.30 | 
    | speedup ratio to 1 node | 
      7.99 |    5.24 |    6.27 |    3.33 |    6.26 |    8.11 |    9.16 | 
    | communication frequency (Mbyte/sec) | 
      0.00 |    9.68 |    12.69 |   13.58 |   1.89 |    7.83 |    5.32 | 
    | communication frequency (# of messages/sec) | 
      4 |       4670 |    2138 |    466 |     1199 |    421 |     488 | 
    | average message size (Kbyte) | 
      0.00 |    2.07 |    5.94 |    29.14 |   1.58 |    18.60 |   10.90 | 
    | MBCF_WRITE availability rate (%) | 
      51.10 |   0.01 |    53.33 |   99.22 |   13.37 |   49.01 |   47.24 | 
    | use of collective communication | 
      yes |     no |      no |      yes |     no |      no |      no | 
  
  
    | MPICH/TCP on SunOS | 
    | execution time (sec) | 
      16.25 |   13.72 |   14.59 |   4.81 |    185.04 |  231.66 |  96.02 | 
    | speedup ratio to 1 node | 
      7.73 |    2.83 |    4.71 |    2.13 |    5.84 |    6.01 |    6.53 | 
  
  
    | MPI/MBCF on SSS-CORE versus MPICH/TCP on SunOS | 
    | performance improvement ratio | 
      1.07 |    1.83 |    1.32 |    1.59 |    1.15 |    1.50 |    1.43 | 
  
  Conditions
  
  
  
    | workstation | 
      SPARCstation 20 (85 MHz SuperSPARC × 1) | 
    | NIC | 
      Sun Microsystems Fast Ethernet SBus Adapter 2.0 | 
    | network | 
      SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) | 
    | OS & RPC implementation | 
      SSS-CORE Ver. 1.1 |  modified SUNRPC 4.0 | 
    
      | SunOS 4.1.4 |                                  SUNRPC 4.0 | 
  
  Round-trip latencies of RPC with 100BASE-TX (in µsec)
  
  
  
    | data size (byte) | 
      4 |    256 |  512 |  1024 | 
  
  
    | SSS-CORE, MBCF_SIGNAL | 
      127 |  173 |  221 |  315 | 
    | SSS-CORE, MBCF_FIFO | 
      148 |  194 |  251 |  372 | 
    | SunOS TCP | 
      863 |  903 |  918 |  1033 | 
  
  Conditions
  
  
  
    | workstation | 
      SPARCstation 20 (85 MHz SuperSPARC × 1) | 
    | NIC | 
      Sun Microsystems Fast Ethernet SBus Adapter 2.0 | 
    | network | 
      Bay Networks BayStack 350T (switching 100BASE-TX HUB) | 
    | OS | 
      SSS-CORE Ver. 1.1 | 
    | runtime system | 
      ADSM | 
  
  Effects of optimization methods on LU-Contig (n = 512, b = 16)
  
  
  
    | optimization methods | 
      execution time (sec) |  # of consistency management codes |  # of packets |  amount of communication (Mbyte) | 
  
  
    | None | 
      28.20 |                 5592 K |                             5207 K |        47.73 | 
    | runtime packet combining | 
      14.35 |                 5592 K |                             83.5 K |        113.00 | 
    | static interprocedural redundancy elimination | 
      2.17 |                  1.43 K |                             7.73 K |        9.42 | 
    | runtime packet combining & static interprocedural redundancy elimination | 
      2.16 |                  1.43 K |                             7.60 K |        9.27 | 
  
  Effects of optimization methods on Radix (#key = 1 M)
  
  
  
    | optimization methods | 
      execution time (sec) |  # of consistency management codes |  # of packets |  amount of communication (Mbyte) | 
  
  
    | None | 
      21.90 |                 793 K |                              3220 K |        76.72 | 
    | runtime packet combining | 
      12.13 |                 793 K |                              75.8 K |        101.08 | 
    | static interprocedural redundancy elimination | 
      1.57 |                  2.08 K |                             19.5 K |        13.47 | 
    | runtime packet combining & static interprocedural redundancy elimination | 
      1.24 |                  2.08 K |                             10.1 K |        13.63 | 
  
![[graph (17KB)]](ADSM.gif)
Figure: Speedups on ADSM
 
  Conditions
  
  
  
  
    | SSS-CORE system | 
      workstation | 
      SPARCstation 20 (85 MHz SuperSPARC × 1) | 
    | NIC | 
      Sun Microsystems Fast Ethernet SBus Adapter 2.0 | 
    | network | 
      Bay Networks BayStack 350T (switching 100BASE-TX HUB) | 
    | OS | 
      SSS-CORE Ver. 1.1 | 
    | runtime system | 
      UDSM | 
  
  
    | AP1000+ system | 
      MPP | 
      Fujitsu AP1000+ (50 MHz SuperSPARC × 256) | 
    | OS | 
      Cell-OS | 
    | runtime system | 
      UDSM | 
  
  Breakdown of execution time
  
  
  
    | Sync | 
      synchronization | 
    | WC | 
      write commitment | 
    | PF | 
      page fault handler | 
    | Msg | 
      remote message handlers | 
    | Task | 
      execution of original application codes | 
  
![[graph (6KB)]](lu.gif)
Figure: Execution time of LU-Contig on 1 to 8 nodes
 
![[graph (6KB)]](radix.gif)
Figure: Execution time of Radix on 1 to 8 nodes
 
![[graph (17KB)]](UDSM.gif)
Figure: Speedups on UDSM (on 
SSS-CORE)
 
To SSS-CORE Home
Page.
Mail to 
<info@ssscore.org>.
© 1998-2000 SSS-CORE Project Team.