Skip to content

Memory performance benchmarks

The STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels. The usage of STREAM is available at this guide.

In order to make sure the comparison between bare metal and HVM guest on Sunlight is fair, the Ubuntu 18.04 stock kernel [4.15.0-45-generic] is selected to be used for STREAM benchmarking in the same hardware system.

The configuration of STREAM in this benchmark as as follow.

~ # OMP_NUM_THREADS=10 GOMP_CPU_AFFINITY=0,2,4,6,8,10,12,14,16,18 ./stream
-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 6291456, Offset = 0Total memory required = 144.0 MB.Each test is run 1000 times, but only the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 10
-------------------------------------------------------------
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 2064 microseconds.
   (= 2064 clock ticks)
Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       39184.4515       0.0026       0.0026       0.0042
Scale:      38071.4576       0.0027       0.0026       0.0064
Add:        42581.7722       0.0036       0.0035       0.0065
Triad:      43680.1640       0.0035       0.0035       0.0069
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------

Raw data

Bare metal performance

Function Rate (MB/s) Avg time Min time Max time
Copy 38717.3283 0.0026 0.0026 0.0027
Scale 37463.3953 0.0027 0.0027 0.0028
Add 42368.1227 0.0036 0.0036 0.0038
Triad 43327.5431 0.0035 0.0035 0.0036

VM on Sunlight

Function Rate (MB/s) Avg time Min time Max time
Copy 38971.0601 0.0026 0.0026 0.0034
Scale 37731.2301 0.0027 0.0027 0.0034
Add 42353.9556 0.0036 0.0036 0.0038
Triad 43377.993 0.0035 0.0035 0.0041

Performance graphs

memory bandwidth comparison