Japanese Page

Performance of SSS-CORE

[Home | Features | Papers | Performance | Demos | Staff | Glossary]
[For CSS2-conforming browser | For non-CSS2-conforming browser | For tabular-challenged browser]

The performance of SSS-CORE has been evaluated from various angles. Here you can see summarized results of the evaluation such as:

The details of the experiments and the discussions are given in our papers.

In the following, the word `SPARCstation 20' stands for Sun Microsystems SPARCstation 20 and its compatible machines. We have mainly used Axil 320 model 8.1.1, which is compatible with Sun Microsystems SPARCstation 20.

Performance of Fundamental System Calls

| workstation | SPARCstation 20 (85 MHz SuperSPARC × 1) |
|             | SSS-CORE Ver. 1.1                       |
|     OS      |-----------------------------------------|
|             | SunOS 4.1.4                             |
      Cost of getting a task ID
| SSS-CORE get_taskid() | 1.12 µsec |
|    SunOS getpid()     | 4.39 µsec |
            Costs of allocating/freeing memory (in µsec)
|    size (byte)    |  4 K     16 K    64 K    256 K     1 M   |
| SSS-CORE allocate |  23.91   28.91   48.77   123.2    431.2  |
|   SSS-CORE free   |  19.49   20.36   23.91    36.23    99.06 |
|   SunOS sbrk()    | 133.2   375.8   894.3   1828     2020    |

Fundamental Communication Performance of MBCF

On Gigabit Ethernet

|     workstation      |Sun Microsystems Ultra 60 (450 MHz UltraSPARC-II × 1)|
|         NIC          |Sun Microsystems GigabitEthernet/P 2.0 Adapter       |
|       network        |(directly connected)                                 |
|OS &                  |SSS-CORE Ver. 2.3               & MBCF               |
|Communication Protocol|-----------------------------------------------------|
|                      |Solaris 2.6                     & TCP/IP             |
     One-way latencies of MBCF/1000BASE-SX (in µsec)
| data size (byte) |   4     16     64     256    1024  |
|       MBCF       |  9.6   11.0   11.5   16.2    35.9  |
|      TCP/IP      | 95.08  95.22  95.39  99.45  114.15 |
     Peak bandwidths of MBCF/1000BASE-SX (in Mbyte/sec)
| data size (byte) |  4     16    64     256   1024   1408  |
|       MBCF       | 2.29  5.67  22.30  55.41  78.22  80.92 |
|      TCP/IP      | 0.09  0.43   1.67   5.56  12.79  20.21 |

Although the software overhead of MBCF is small enough, the peak bandwidth does not come up to the hardware limit of 125 Mbyte/sec. There should be some bottleneck around the Ultra 60's hardware.

On Fast Ethernet

| workstation | SPARCstation 20 (85 MHz SuperSPARC × 1)                  |
|     NIC     | Sun Microsystems Fast Ethernet SBus Adapter 2.0          |
|             | SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) |
|   network   |----------------------------------------------------------|
|             | Bay Networks BayStack 350T (switching 100BASE-TX HUB)    |
|     OS      | SSS-CORE Ver. 1.1                                        |
   One-way latencies of MBCF/100BASE-TX (in µsec)
| data size (byte) |  4     16    64   256   1024  |
|    MBCF_WRITE    | 24.5  27.5  34    60.5  172   |
|    MBCF_FIFO     | 32    32    40.5  73    210.5 |
|   MBCF_SIGNAL    | 49    52.5  60.5  93    227.5 |
        Peak bandwidths of MBCF/100BASE-TX (in Mbyte/sec)
|    data size (byte)     |  4     16    64   256   1024   1408  |
| MBCF_WRITE, half duplex | 0.31  1.15  4.31  8.56  11.13  11.48 |
| MBCF_WRITE, full duplex | 0.34  1.27  4.82  9.63  11.64  11.93 |

Communication Performance of MPI/MBCF

|   workstation    | SPARCstation 20 (85 MHz SuperSPARC × 1)                  |
|       NIC        | Sun Microsystems Fast Ethernet SBus Adapter 2.0          |
|                  | SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) |
|     network      |----------------------------------------------------------|
|                  | Bay Networks BayStack 350T (switching 100BASE-TX HUB)    |
|OS &              | SSS-CORE Ver. 1.1          & MPI/MBCF                    |
|MPI implementation|----------------------------------------------------------|
|                  | SunOS 4.1.4                & MPICH Ver. 1.1 (using TCP)  |
       Round-trip times of MPI with 100BASE-TX (in µsec)
| message size (byte)  |  0    4   16    64   256   1024  4096 |
| MPI/MBCF on SSS-CORE |  71   85   85   106   168   438  1026 |
|  MPICH/TCP on SunOS  | 968  962  980  1020  1080  1255  2195 |
            Peak bandwidths of MPI with 100BASE-TX (in Mbyte/sec)
|  message size (byte)  |  4     16    64   256   1024   4096   16384  65536 |
| MPI/MBCF on SSS-CORE, | 0.14  0.53  1.82  4.72   8.08   9.72  10.15   9.78 |
|           half duplex |                                                    |
| MPI/MBCF on SSS-CORE, | 0.14  0.57  1.90  5.33  10.22  11.68  11.77  11.85 |
|           full duplex |                                                    |
| MPICH/TCP on SunOS,   | 0.02  0.09  0.35  1.27   3.54   6.04   5.59   7.00 |
|           half duplex |                                                    |

Efficiency of MPI/MBCF for the NAS Parallel Benchmarks

|   workstation    | SPARCstation 20 (85 MHz SuperSPARC × 1)                  |
|       NIC        | Sun Microsystems Fast Ethernet SBus Adapter 2.0          |
|     network      | SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) |
|OS &              | SSS-CORE Ver. 1.1          & MPI/MBCF                    |
|MPI implementation|----------------------------------------------------------|
|                  | SunOS 4.1.4                & MPICH Ver. 1.1 (using TCP)  |
               Execution results of the NAS Parallel Benchmarks
|  program [# of nodes]   | EP[8]  MG[8]  CG[8]  IS[8]   LU[8]   SP[9]  BT[9] |
|                            MPI/MBCF on SSS-CORE                             |
|  execution time (sec)   | 15.14  7.48   11.02  3.02   160.36  154.91  67.30 |
| speedup ratio to 1 node | 7.99   5.24   6.27   3.33   6.26    8.11    9.16  |
|communication frequency  | 0.00   9.68   12.69  13.58  1.89    7.83    5.32  |
|              (Mbyte/sec)|                                                   |
|communication frequency  | 4      4670   2138   466    1199    421     488   |
|      (# of messages/sec)|                                                   |
|average message size     | 0.00   2.07   5.94   29.14  1.58    18.60   10.90 |
|                  (Kbyte)|                                                   |
|MBCF_WRITE               | 51.10  0.01   53.33  99.22  13.37   49.01   47.24 |
|    availability rate (%)|                                                   |
|use of                   | yes    no     no     yes    no      no      no    |
| collective communication|                                                   |
|                             MPICH/TCP on SunOS                              |
|  execution time (sec)   | 16.25  13.72  14.59  4.81   185.04  231.66  96.02 |
| speedup ratio to 1 node | 7.73   2.83   4.71   2.13   5.84    6.01    6.53  |
|               MPI/MBCF on SSS-CORE versus MPICH/TCP on SunOS                |
|performance improvement  | 1.07   1.83   1.32   1.59   1.15    1.50    1.43  |
|                    ratio|                                                   |

Performance of the RPC with MBCF

|   workstation    | SPARCstation 20 (85 MHz SuperSPARC × 1)                  |
|       NIC        | Sun Microsystems Fast Ethernet SBus Adapter 2.0          |
|     network      | SMC TigerStack 100 5324TX (non-switching 100BASE-TX HUB) |
|OS &              | SSS-CORE Ver. 1.1          & modified SUNRPC 4.0         |
|RPC implementation|----------------------------------------------------------|
|                  | SunOS 4.1.4                & SUNRPC 4.0                  |
Round-trip latencies of RPC with 100BASE-TX (in µsec)
|   data size (byte)    |  4   256  512  1024 |
| SSS-CORE, MBCF_SIGNAL | 127  173  221   315 |
|  SSS-CORE, MBCF_FIFO  | 148  194  251   372 |
|       SunOS TCP       | 863  903  918  1033 |

Efficiency of RCOP for the SPLASH-2 suite


|  workstation   | SPARCstation 20 (85 MHz SuperSPARC × 1)               |
|      NIC       | Sun Microsystems Fast Ethernet SBus Adapter 2.0       |
|    network     | Bay Networks BayStack 350T (switching 100BASE-TX HUB) |
|       OS       | SSS-CORE Ver. 1.1                                     |
| runtime system | ADSM                                                  |
        Effects of optimization methods on LU-Contig (n = 512, b = 16)
|  optimization methods   |execution |# of consistency|# of    |amount of    |
|                         |time (sec)|      management| packets|communication|
|                         |          |           codes|        |      (Mbyte)|
|          None           |    28.20 |         5592 K | 5207 K |       47.73 |
|runtime packet combining |    14.35 |         5592 K | 83.5 K |      113.00 |
|static interprocedural   |     2.17 |         1.43 K | 7.73 K |        9.42 |
|   redundancy elimination|          |                |        |             |
|runtime packet combining |     2.16 |         1.43 K | 7.60 K |        9.27 |
| & static interprocedural|          |                |        |             |
|   redundancy elimination|          |                |        |             |
            Effects of optimization methods on Radix (#key = 1 M)
|  optimization methods   |execution |# of consistency|# of    |amount of    |
|                         |time (sec)|      management| packets|communication|
|                         |          |           codes|        |      (Mbyte)|
|          None           |    21.90 |          793 K | 3220 K |       76.72 |
|runtime packet combining |    12.13 |          793 K | 75.8 K |      101.08 |
|static interprocedural   |     1.57 |         2.08 K | 19.5 K |       13.47 |
|   redundancy elimination|          |                |        |             |
|runtime packet combining |     1.24 |         2.08 K | 10.1 K |       13.63 |
| & static interprocedural|          |                |        |             |
|   redundancy elimination|          |                |        |             |
[graph (17KB)]
Figure: Speedups on ADSM


|          |  workstation   | SPARCstation 20 (85 MHz SuperSPARC × 1)         |
|          |----------------|-------------------------------------------------|
|          |      NIC       | Sun Microsystems Fast Ethernet SBus Adapter 2.0 |
|          |----------------|-------------------------------------------------|
| SSS-CORE |    network     | Bay Networks BayStack 350T                      |
|   system |                |                      (switching 100BASE-TX HUB) |
|          |----------------|-------------------------------------------------|
|          |       OS       | SSS-CORE Ver. 1.1                               |
|          |----------------|-------------------------------------------------|
|          | runtime system | UDSM                                            |
|          |      MPP       | Fujitsu AP1000+ (50 MHz SuperSPARC × 256)       |
| AP1000+  |----------------|-------------------------------------------------|
|   system |       OS       | Cell-OS                                         |
|          |----------------|-------------------------------------------------|
|          | runtime system | UDSM                                            |
           Breakdown of execution time
| Sync | synchronization                         |
|  WC  | write commitment                        |
|  PF  | page fault handler                      |
| Msg  | remote message handlers                 |
| Task | execution of original application codes |
[graph (6KB)]
Figure: Execution time of LU-Contig on 1 to 8 nodes
[graph (6KB)]
Figure: Execution time of Radix on 1 to 8 nodes
[graph (17KB)]
Figure: Speedups on UDSM (on SSS-CORE)

To SSS-CORE Home Page.
Mail to <info@ssscore.org>.
© 1998-2000 SSS-CORE Project Team.