Skip to content

Apache Hadoop Benchmarking

Hadoop Logo

Application description

The Apache Hadoop project provides an open source framework for enabling massively distributed compute scaling both for computational and data storage scale. Hadoop clusters of nodes are built on top of the Hadoop Distributed FileSystem (HDFS) to provide localised access to shared data without impacting the overall locking and serialisation of the complete dataset. In this manner, local storage and CPU performance is an important factor in scaling overall system performance. Specifically, It provides a software framework for distributed storage and the processing of big data using the MapReduce. In a multi-node hadoop cluster, all the essential daemons are up and run on different hosts. A multi-node hadoop cluster setup has a master-slave architecture where in one machine acts as a master that runs the NameNode daemon while the other machines acts as slave or worker nodes to run other hadoop daemons.

Hadoop Logo

Infrastructure Environment tested

The Apache Hadoop multi-node cluster application has been tested on the following:

Host OS Kernel Version Multi-Queue Block
CentOS 7 3.10.0-862.14.4.el7.x86_64 NO
CentOS 7 5.13.4-1.el7.elrepo.x86_64 YES
CentOS 8.3 4.18.0-240.1.1.el8_3.x86_64 NO
CentOS 8.3 5.13.4-1.el8.elrepo.x86_64 YES
Resource Value
Package manager yum
Application version 1.2.1
Environments tested XeonD Sunlight r5.2xlarge vs Bobcat Peak Sunlight r5.2xlarge
Flavour Cores Memory Storage
r5.2xlarge 6 4G 200G

Configuration and Setup description

Test data sizes are 100G per file, total 800 GB. Benchmark tested is the built-in TestDFSIO hadoop stress test utility.

Resource Value
Approx Package installation time < 5 mins
Approx test execution time ~ 8 mins
/bin/hadoop jar /usr/share/hadoop/hadoop-test-1.2.1.jar TestDFSIO -write -nrFiles 8 -fileSize 12800
/bin/hadoop jar /usr/share/hadoop/hadoop-test-1.2.1.jar TestDFSIO -read -nrFiles 8 -fileSize 12800
/bin/hadoop jar /usr/share/hadoop/hadoop-test-1.2.1.jar TestDFSIO -clean

You can find this hadoop application available as recipe from the SIM marketplace.

Data Results Table

Test completion time

Flavour Server Enviroment Host OS Multi-Queue Block Write Read Combined RW (50:50)
Sunlight r5.2xlarge Xeond CentOS 7 Disabled 405.497 246.516 326.006
Sunlight r5.2xlarge BP CentOS 7 Disabled 305.375 244.323 247.849
Sunlight r5.2xlarge Xeond CentOS 7 Enabled 432.584 253.435 342.009
Sunlight r5.2xlarge BP CentOS 7 Enabled 311.292 254.331 282.801
Sunlight r5.2xlarge Xeond CentOS 8.3 Disabled 480.56 257.45 369.005
Sunlight r5.2xlarge BP CentOS 8.3 Disabled 302.372 242.428 272.4
Sunlight r5.2xlarge Xeond CentOS 8.3 Enabled 438.582 236.497 337.539
Sunlight r5.2xlarge BP CentOS 8.3 Enabled 335.295 253.327 294.311

Throughput test

Flavour Server Enviroment Host OS Multi-Queue Block Write Read Combined RW (50:50)
Sunlight r5.2xlarge Xeond CentOS 7 Disabled 253.62 417.74 335.68
Sunlight r5.2xlarge BP CentOS 7 Disabled 337.08 421.61 379.345
Sunlight r5.2xlarge Xeond CentOS 7 Enabled 237.4 405.72 321.56
Sunlight r5.2xlarge BP CentOS 7 Enabled 330.46 404.91 367.685
Sunlight r5.2xlarge Xeond CentOS 8.3 Disabled 213.58 400.8 307.19
Sunlight r5.2xlarge BP CentOS 8.3 Disabled 340.65 424.54 382.595
Sunlight r5.2xlarge Xeond CentOS 8.3 Enabled 234.39 435.96 335.175
Sunlight r5.2xlarge BP CentOS 8.3 Enabled 306.57 406.73 356.65

The results between the 2 tested environments demonstrate better performance in Bobcat peak nodes rather than XeonD nodes in most cases.

Performance Evaluation Graphs

Hadoop Read and Write test completion time

Hadoop RW throughput