Memory performance benchmarks
The STREAM benchmark is a simple synthetic benchmark program that measures sustainable memory bandwidth (in MB/s) and the corresponding computation rate for simple vector kernels. The usage of STREAM is available at this guide.
In order to make sure the comparison between bare metal and HVM guest on Sunlight is fair, the Ubuntu 18.04 stock kernel [4.15.0-45-generic] is selected to be used for STREAM benchmarking in the same hardware system.
The configuration of STREAM in this benchmark as as follow.
~ # OMP_NUM_THREADS=10 GOMP_CPU_AFFINITY=0,2,4,6,8,10,12,14,16,18 ./stream
-------------------------------------------------------------
STREAM version $Revision: 5.9 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 6291456, Offset = 0Total memory required = 144.0 MB.Each test is run 1000 times, but only the *best* time for each is used.
-------------------------------------------------------------
Number of Threads requested = 10
-------------------------------------------------------------
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
Printing one line per active thread....
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 2064 microseconds.
(= 2064 clock ticks)
Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 39184.4515 0.0026 0.0026 0.0042
Scale: 38071.4576 0.0027 0.0026 0.0064
Add: 42581.7722 0.0036 0.0035 0.0065
Triad: 43680.1640 0.0035 0.0035 0.0069
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
Raw data
Bare metal performance
Function | Rate (MB/s) | Avg time | Min time | Max time |
---|---|---|---|---|
Copy | 38717.3283 | 0.0026 | 0.0026 | 0.0027 |
Scale | 37463.3953 | 0.0027 | 0.0027 | 0.0028 |
Add | 42368.1227 | 0.0036 | 0.0036 | 0.0038 |
Triad | 43327.5431 | 0.0035 | 0.0035 | 0.0036 |
VM on Sunlight
Function | Rate (MB/s) | Avg time | Min time | Max time |
---|---|---|---|---|
Copy | 38971.0601 | 0.0026 | 0.0026 | 0.0034 |
Scale | 37731.2301 | 0.0027 | 0.0027 | 0.0034 |
Add | 42353.9556 | 0.0036 | 0.0036 | 0.0038 |
Triad | 43377.993 | 0.0035 | 0.0035 | 0.0041 |