Ultimate Guide to CPU Benchmarking Tools for Testing Dedicated Servers


In this blog post we are going to deep dive into the world of CPU benchmarking tools. From Passmark, to Dhrystone, we’ll review the most popular CPU benchmark tools and talk about how CPU benchmarking should affect your decision on purchasing a Dedicated Server.

Today our world, literally, runs digitally. Just as keystrokes iterate algorithmically putting characters on a laptop screen as this sentence is typed; the same happens as traction control is enabled on a modern All-Wheel-Drive vehicle.

The same CPU concept is the heart of all actions when consumers click “Buy” at their favorite online retailer from a handheld phone. There is a CPU computing 1’s and 0’s making all the magic happen.

The CPU, or Central Processing Unit, has become more efficient as predicted by Moore’s law; consuming less power in terms of wattage, shrinking in physical size, being able to handle multiple actions simultaneously (as seen with Hyper-Threading and multi-core technologies), and run at speeds performing billions of calculation per second.

As consumers, to gain insight into benchmarking a CPU we must first understand what separates one CPU from another. A Central Processing Unit can be designed to specialize in multiple areas.

KPI’s that define CPUs

  1. Energy efficiency

    How much wattage is being used compared to how much data is being processed (lower wattage consumption equates to lower temperatures)

  2. Raw processing power

    How fast data is being processed in a given time period

  3. Processing efficiency

    How much data is being processed simultaneously.

  4. Stability

    How long a CPU can process data at peak load without failure

 

Key Takeaway

These four categories are the cornerstones in the design of any transistor based CPU. From a now standard smartphone to a word-class server entrusted to mission critical business operations. Each key area is taken into consideration when supplying the “brain” to a power an electronic computing device.

Understanding CPU Benchmarks

CPU Benchmarks will typically perform a series of tests measuring CPU calculations. Usually evaluating how much time each operation takes.

CPU performance measurements
Floating point math Mathematics calculations performed on whole numbers.
Encryption Mathematics calculations performed on whole numbers.
Integer math Will test various standard encryption algorithms used such as DES, Triple DES and AES.
File compression/decompression How well the CPU can compress and extract archives in common compression schemes such as .zip, .rar, .tbz, .7z
Prime number testing How fast the CPU can search for prime numbers.
Extended instruction set tests Instructions that are native to a specific CPU architecture. These can speed up applications when they are used by the developer.
String Sorting Mimics how user-level applications parse and handle string or textual.
Image Rendering Mathematics calculations performed on whole numbers.
Integer math how quick the CPU can render images for common formats: .jpg, .png, gif and even .pdf.
Physics test how well vector graphics are calculated in both 2d and 3d by the CPU. Useful for games, graphic effects like Filters in Photoshop, and ray tracing shadows and lighting in 3d graphics.
Single thread test Some applications only run a single thread, lots actually. This will test how the CPU handles these applications separate from multi-core processing. This is where a lot of Intel iSeries (the i5 and i7) and Xeon will shine. Turbo boost and efficient single thread processing allow these CPU architectures to run at speeds that would burn up if all cores ran in Intel Turbo Boost Mode simultaneously.

PCMark 8.0 Performing Real-World Web
Browsing Simulation Test

Some Benchmarking applications will even perform application level tests such as: parsing HTML5 Document Object modeling and SQL database access (most larger applications actually use a compact SQL database on the local machine called SQLite to store data.).

As a CPU is rated in each category a software-application-specific algorithm is performed to rate the CPU as a whole by combining all sub-scores. This will give the CPU a final score. Benchmark users can then submit their scores to see how their system compares against other end-users.

Real World Example of Evaluating CPU Architecture via Benchmarking

Let’s compare two CPUs architectures and see how they perform.

 Intel i5 4278U  AMD FX-8350
  • Dual Core, 2.6Ghz Hyper-Threaded
  • Turbo Boost to 3.10Ghz
  • 3MB L3 Cache
  • Octa Core, 4.0Ghz
  • 8MB L3 Cache
  • 4×2 MB L2 Cache

Cumulative Scores in single-core GeekBench Benchmark Scores

One might ask, why did the slower clock speed beat the faster cycling CPU in single core performance?

This is a great demonstration of how CPU architecture, core design, caching speeds can allow a CPU to do more per clock cycle in certain operations.

Simply, the Intel has more FPU’s (floating point processing units) per execution core. While the AMD has a shared L2 cache per execution core, causing CPU Stall on single threaded executions.

CPU Cache prevents CPU Stalling

CPU stalling is latency in moving data from disk, to memory, through the cache pipeline, then finally to CPU’s core for execution operations.

Preventing CPU stalling is accomplished by more and faster cache on the CPU’s L1, L2 and L3 cache that can be distributed amongst each core.

CPU Stalling in terms of end-user class computing is not a big deal. As we mentioned, end-user class computing is designed for smaller burstable tasks calling a few worker threads for a limited amount of time.

 

Modern multicore CPU

Modern multicore CPUs with also have an L3 cache that is shared amongst all CPU cores.

 

What are CPU Benchmarks for?

As they relate to a consumer, CPU benchmarks can serve a few primary purposes

  1. Allow non-tech-savvy consumers a comparison of archived CPU scores assisting in making a somewhat educated choice of purchase
  2. Allow avid and technically inclined computer hardware enthusiasts to get an accurate baseline of computer customization, hardware upgrades and advance the performance of computer systems through a process called overclocking the CPU.
  3. Allow system builders to simulate real world computer usage, letting them know a system will perform demands of required tasks for clients.
  4. Let consumers as a whole know the latest advertised technologies actually improves performance as claimed by a hardware manufacturer.

CPU benchmarking tools define a myriad of uses, above are the four main uses for consumers. As technology advances, CPU benchmarking tools let us know the latest architectures by AMD, Intel, IBM are actually worth our hard-earned dollars paid out in upgrades.

A new automobile, we can see how it is nicer than an older one. Not always so with a new computer unless we have four years of specialized CS training to decipher all the acronyms bombarding us on showroom floors and manufacturer advertising.

For more specialized uses, these tools can be used by software developers or even hardware Architecture Engineers to assist in designing and refining both software and hardware.

 

Top, All Time Overall CPU Benchmarks on Passmark Web Archive

 

How Accurate are CPU Benchmarking Tools?

Most CPU Benchmarking tools are accurate in what they do. However, time should be taken to understand what is being evaluated in a benchmark.

It is also important to use a widely known CPU benchmarking tools. This way benchmarks can be compared to past contributors. Only then is it possible to accurately gauge the performance of a CPU with benchmarking utilities.

The first thing we want to avoid are benchmarks that rate a complete system for gaming. Video gaming, when rated in frames per second (FPS) is highly dependent on a high-performance video card. These will really measure how well the CPU keeps up with offloaded processing of a high-end GPU. Not really how well the CPU is performing as an individual component.

The second thing we must note is most CPU benchmarking tools (all covered in this guide) will evaluate a CPU based on consumer computing and not server demands. Instead of measuring how well a CPU will process data in a 4-month period, tests are done in short bursts to simulate use of an end-user (versus a server).

With a little education of business versus consumer-computing and what each benchmark specializes in, we as consumers can accurately tell if a certain CPU will satisfy our needs.

End-User vs. Business Class CPUs

End-user CPU performance is usually burstable into smaller, single action based tasks. Each task can then call a few worker threads. Most benchmarking covered, unfortunately, scores a CPU based on how an office worker would make use of it.

Let’s look at our work day

  • Get into the office and check email.
  • See an attachment of our proofs for the latest ad in a trade magazine.
  • Download, then unzip these files.
  • Finally loading them into our favorite graphics program to see how they look.
  • Then open a faulty application and get a kernel panic (or blue screen) and simply reboot the computer.
  • Finally, load each image back into the graphics app from disk one at a time.

A server-class CPU architecture represents many people doing the same actions simultaneously. It is widely known in Computer Science multi-core or multi-CPU parallel processing will exceed single core performance with a higher clock-speed.

Frames Per Second Benchmarks Are Only Useful
For Gaming. Photo credit Assassin Creed 4

Unless the distributed workers are magnitudes slower, like a 4GHZ clock rate versus four 80486 CPU’s. This is true for any operation from calculating multiple floating point numbers to encrypting multiple files.

Also worth a note that business class CPU’s such as a Xeon make use of ECC memory. Error Correcting Code in memory will not only assist in preventing crashes but also protect against hardware failure (as noted by a Microsoft paper entitled: Cycles, Cells, and Platters: An Empirical Analysis of Hardware Failures on a Million Consumer PCs). Due to the error correction checking, a hit will be taken on overall CPU execution time.

With this in mind let’s look at business class computing:

  • 24/7 a CPU is never resting
  • Similar tasks to consumer computing are being performed simultaneously, non-stop
  • Recovery from a blue screen for a server or a high-end workstation can result in loss of up to 8 hours in work or more in lost revenue.

So when we see CPU architectures like the XEON being rated very close to (and greatly exceeding in some operations) performance in terms of office-user tasks, in say a benchmark score from Passmark: that is saying a lot. While not even touching on better processing efficiency found in business CPU architecture.

Related Resources

Dhrystone Benchmarking Review

Dhrystone is included for historical significance. This was released in the 1980’s then gained popularity when it was ported for Unix in C. The original Dhrystone benchmark was coded in Ada.

Dhrystone, though popular at the time is not really used much in today’s world. As it suffers several flaws that can cause misrepresented benchmarks.

Disadvantages of Dhrystone CPU benchmarking:

  • String sorting is done with aligned strings
  • No floating point math tests, so FLU is left untested
  • The entire application is small and will fit into a modern CPU’s EX Core cache.

The above shortcomings amount to benchmarks not showing results from real-world operations, architecture deficiencies not being exposed (as in the previous Intel to AMD architecture benchmark comparison). Also system operations such as memory speed and Front-side Bus architecture not represented in bench tests.

But this is not what Dhrystone is used for any longer. Its main purpose in today’s world is testing embedded CPU’s like those found in machinery and other equipment.

Advantages of Dhrystone CPU benchmarking:

  • Benchmarking embedded CPU’s in small devices
  • Giving a great comparison of compiler performance
  • Performing synthetic benchmarks to compare different system architectures (for example Intel x86 vs. ARM)

Dhrystone is unique, in that its binary footprint is small enough to benchmark embedded CPU’s found powering smaller devices. Also, the entire program is self-contained, meaning it will compile without system specific libraries.

Dhrystone running on a modern Ubuntu Linux Workstation

In order to even start to use Dhrystone, one must be somewhat savvy with researching and editing source code. In order to get it running the author needed a little help and made about 5 modifications to source code as instructed from here.

PassMark CPU Benchmarking Review

Passmark can be considered the current reigning champion of CPU benchmarking for tech-savvy system builders. These are people who build high-performance desktop computers, overclocking their CPU. Overclocking means making a stock CPU exceed the limits in terms of clock-cycles defined by the OEM manufacturer.

PassMark does an excellent job of rating a CPU in several areas as noted in “Understanding CPU Benchmarks”. PassMark then calculates a score that is widely known as a standard in the CPU overclocking community.

It is common to boast about one’s PassMark score to tech savvy friends and in online enthusiast forums (a good PasskMark score with screenshots will get instant street cred in some circles).

Advantages of Passmark CPU Benchmarking Software:

  • Standard score that is known amongst skilled system builders
  • Will show system ranking as compared to all other rankings
  • Low cost of ownership: $27.00 USA
  • Baseline Management features
  • Granular control of CPU Benchmarking tests

Disadvantages of PassMark CPU benchmarking software:

  • Only Supports Windows operating system
  • Just gives CPU performance testing, no simulated real-world or application testing
  • Does not depict testing servers in real-world scenarios in terms of weeks, months and years
  • Does not test power efficiency, only pure processing speed
PassMark CPU Mark is the accumulated rating of combined individual CPU tests
PassMark features a comprehensive CPU Mark database searchable by CPU Type

In closing, PassMark is the current leader in benchmarking CPU performance on Windows-based laptops, desktops and servers.

Unless one finds a need to perform specific benchmarking for applications, compare Apples to Windows, benchmark Embedded Microprocessors; PassMark is the clear leader of the bunch for testing high-performance CPUs and systems.


GeekBench4 CPU Benchmarking Review

The first thing catching my eye about GeekBench was its support to run the same tests on multiple operating systems. Then the simplicity of the Graphical User Interface; to run a GeekBench CPU benchmark the user simply clicks a single button.

The GeekBench main form screen is both simple and elegant
GeekBench Mac CPU test
GeekBench Windows Result

GeekBench is definitely a great benchmarking tool for those with minimal technical knowledge or prefer Windows alternative operating systems. GeekBench also has a benchmarking app for both iOS and Android. The paid version will benchmark nVidia’s Cuda hardware acceleration.

It was a sad note GeekBench4 for Linux would not work out of the box. Downloading GeekBench4 for Linux, then launching the executable resulted in an error. Even though the application functionality was not simple as Mac or Windows versions this is to expected on Linux.

Perhaps because I do not use Ubuntu and GeekBench4 was tailored for it. I’m sure with a little tweaking and a few posts on the GeekBench4 forums this could be ironed out.

GeekBench4 is a great application for those who want simple but rounded and comparable benchmarking scores across a range of operating systems.

Like the application user interface, the GeekBench Chart website containing user-contributed benchmarks is both simple and elegant in function.

 

It may fall a little short of the granularity offered in PassMark and real-world benchmarking simulations are not as complete like PCMark.

It is quick, simple and easy to use. In the “Comparing Architectures Section”, we used GeekBench4 to quickly spot a design difference between Intel and AMD CPU architectures that resulted in Intel’s great single core performance.

GeekBench4 may not a good choice as PassMark for benchmarking server or modified desktop systems, however. It simply doesn’t offer the granularity or end-user base of Passmark.


PCMark 8 CPU Benchmarking Review

PCMark is a little different from the aforementioned CPU benchmarking tools. In that PCMark will test an entire system then formulate a benchmark based on scores for combined component performance.

PCMark will offer system benchmark testing in five categories of use. So this would be an excellent tool for those who need to figure out performance based on specific usage. Or would like to rate a laptop to see its performance in each category.

PCMark Offers benchmarking in specified categories of intended use

Judging from the benchmarking process, PCMark will probably excel in testing real-world desktop system usage.

PCMark has put a lot of attention into detail and excels at offering a polished and professional look and feel. Even the website of contributed benchmarks has gone the extra mile to look polished.

The final PCMark tallied benchmark is cumulative of overall system performance. But does not really reflect how the CPU benchmarks as a component.


CPU Benchmarks and Dedicated Servers

CPU benchmarking is important for dedicated servers. However, only to get a brief overview of comparative performance in certain areas with consumer benchmarking tools.

Basic, limited actions performed in the PassMark CPU benchmarks will give great approximations of how the CPU architecture performs:

All these components factors into CPU performance

  • Clock speeds
  • Cache pipelines
  • Front-side Bus efficiency/speed
  • Memory speed
  • Disk speed

However, these do not account for operations being performed at the Hypervisor level, or these tasks being performed simultaneously by more than one user at a time.

This is what separates business-class CPU architecture from consumer-level CPU architecture.

Put simply: architectures like the Xeon are made for high-demand, distributed workloads with reliability. While consumer level CPU’s are designed to operate on sheer burst processing that relies mostly on sheer clock-cycles per second.

This usually equates to server-class CPU design having more cache, better at distributing loads (called Parallel Processing), and superior performance at the Hypervisor level.

With this comes more cost in development and materials. But will return that cost in money saved by being able serve more customers with greater reliability.

With a consumer CPU, the architecture is simply (usually) measured in sheer clock-cycles to gain less expensive (in terms of design, R&D, and materials) performance increases. This architecture design sacrifices: longevity, reliability, and will equate to greater cost down the road for server use.

Conclusion

There are several great CPU benchmarking tools that are well respected in their ability to evaluate and compare CPU performance of dedicated servers.

These utilities are important to assist consumers in:

  • Getting the right computer for intended use
  • Keep CPU manufacturer claims honest
  • Comparing CPU hardware across operating systems
  • Gauging performance of different CPU architectures
  • Comparing systems amongst contributed benchmarks

All these are important to both end-users and server architects. Gauging server CPU performance is a little more difficult in real-world terms: hundreds of variables over months of processing. With tools like PassMark, we get some tangible indicators of CPU performance.