Executive Overview

Intel's silicon design engineers need significant increases in computing capacity—both on their workstations and on data center servers—to deliver each new generation of silicon chips. To meet those requirements, Intel IT conducts ongoing throughput performance tests, using the Intel® silicon design workloads, to analyze the benefits of introducing compute servers based on new, more powerful processors in the field of electronic design automation (EDA).

We recently tested a dual-socket server based on the latest Intel® Xeon® Platinum 8168 processor running single-threaded and multi-threaded EDA applications operating on more than 200 hours of Intel design workloads. By utilizing all available cores, the server completed the workloads up to 86x faster than a server based on a 64-bit Intel Xeon processor (3.6 GHz) with a single core. Historically, it is up to 86x faster than a 64-bit Intel Xeon processor with a single core.

Based on our performance assessment and our refresh cycle, we plan to deploy servers based on the new Intel Xeon processor Scalable family, completing our replacement of servers based on the 8-core Intel Xeon processor E5-2600 series that are more than four years old. By doing so we expect to significantly increase EDA throughput while realizing savings, because we can avoid data center construction and reduce additional power consumption.

For more complete information about performance and benchmark results, visit intel.com/benchmarks. Performance results based on testing details and system configuration. See the full disclaimer and system configurations on page 6.
Silicon chip design engineers at Intel face ongoing challenges: integrating more features into ever-shrinking silicon chips, bringing products to market faster, and keeping design engineering and manufacturing costs low.

As design complexity increases, the requirements for compute capacity also increase, so refreshing servers and workstations with higher performing systems is cost-effective and offers a competitive advantage by enabling faster chip design. Refreshing older servers also enables us to realize data center cost savings. By taking advantage of the performance and power-efficiency improvements in new server generations, we can increase computing capacity within the same data center footprint, avoiding expensive data center construction and achieving operational cost savings due to reduced power consumption.

Intel IT conducts ongoing performance tests, based on the latest Intel® silicon design data, to analyze the potential performance and data center benefits of introducing servers based on new processors into our electronic design automation (EDA) computing environment. Table 1 illustrates some of the architectural enhancements.

**Table 1. A Comparison of Dual-Socket Servers Based on Intel® Xeon® Processors**

<table>
<thead>
<tr>
<th>Year</th>
<th>Introduction</th>
<th>Intel® Chipset</th>
<th>Process Technology</th>
<th>Cores per Socket</th>
<th>Cache</th>
<th>Interconnect Speed</th>
<th>DIMMs</th>
<th>Memory Type</th>
<th>Memory Bandwidth</th>
<th>Maximum Memory</th>
</tr>
</thead>
<tbody>
<tr>
<td>2004-2005</td>
<td>2004-2005</td>
<td>E7520</td>
<td>90nm</td>
<td>1</td>
<td>1 MB or 2 MB²</td>
<td>6.4 GB/s</td>
<td>Up to 8</td>
<td>DDR2-400 MHz</td>
<td>64 GB or 128 GB²</td>
<td>16 GB</td>
</tr>
<tr>
<td>2006-2008</td>
<td>2006-2008</td>
<td>5400</td>
<td>65nm and 45nm</td>
<td>2 or 4</td>
<td>4 MB or 6 MB²</td>
<td>21-25 GB/s</td>
<td>Up to 16</td>
<td>FB-DIMM/DDR2-667 MHz or FB-DIMM/DDR2-800 MHz</td>
<td>21-25 GB/s</td>
<td>Up to 64 GB</td>
</tr>
<tr>
<td>2009-2011</td>
<td>2009-2011</td>
<td>5520</td>
<td>45nm and 32nm</td>
<td>4 or 6</td>
<td>8 MB or 12 MB²</td>
<td>25.6 GB/s per Intel® QuickPath Interconnect</td>
<td>Up to 18</td>
<td>DDR3-800/1066/1333 MHz</td>
<td>Up to 32 GB/s</td>
<td>Up to 144 GB</td>
</tr>
<tr>
<td>2012-2016</td>
<td>2012</td>
<td>C600</td>
<td>32nm</td>
<td>8</td>
<td>20 MB shared</td>
<td>32 GB/s per Intel® QuickPath Interconnect</td>
<td>Up to 24</td>
<td>DDR3-1333/1600 MHz</td>
<td>Up to 51.2 GB/s</td>
<td>Up to 288 GB</td>
</tr>
<tr>
<td>2013</td>
<td>2013</td>
<td></td>
<td></td>
<td></td>
<td>10</td>
<td></td>
<td></td>
<td>DDR3-1333/1600 MHz</td>
<td>Up to 59.7 GB/s</td>
<td>Up to 288 GB</td>
</tr>
<tr>
<td>2014</td>
<td>2014</td>
<td></td>
<td></td>
<td></td>
<td>14</td>
<td></td>
<td></td>
<td>DDR4-1600/1866/2133 MHz</td>
<td>Up to 68 GB/s</td>
<td>Up to 768 GB</td>
</tr>
<tr>
<td>2016</td>
<td>2016</td>
<td></td>
<td></td>
<td></td>
<td>22</td>
<td></td>
<td></td>
<td>DDR4-2400 MHz</td>
<td>Up to 1536 GB³</td>
<td>Up to 3072 GB³</td>
</tr>
<tr>
<td>2017</td>
<td>2017</td>
<td></td>
<td></td>
<td></td>
<td>28</td>
<td></td>
<td></td>
<td>DDR4-2666 MHz</td>
<td>Up to 3072 GB³</td>
<td></td>
</tr>
</tbody>
</table>

¹ Data provided only for 1 MB cache. ² 128 GB support with Intel® 5400 Chipset introduced in 2007. ³ 144 GB assumes 18 memory slots populated with 8-GB DIMMs; 288 GB assumes 18 memory slots populated with 16-GB DIMMs, and validated only with Intel® Xeon® processor 5600 series. ⁴ 768 GB assumes 24 memory slots populated with 32-GB DIMMs. ⁵ 1536 GB assumes 24 memory slots populated with 64-GB DIMMs. ⁶ 3072 GB assumes 24 memory slots populated with 128-GB DIMMs.
Faster Servers Process More EDA Jobs in Less Time

The architectural enhancements shown Table 1 illustrate how the Intel® Xeon® processor has evolved over the last few years. We have found that refreshing data center servers to use the latest processor technology substantially improves EDA throughput.

While our assessments focus on EDA applications, throughput improvements may also be achieved with other applications used in high-performance computing environments where simulation and verification are large parts of the workflow, including:

- Computational fluid dynamics and simulation in the aeronautical and automobile industries
- Synthesis and simulation applications in the life sciences industry
- Simulation in the oil and gas industries

Test Methodology

We ran tests on dual-socket servers based on the Intel® Xeon® Platinum 8168 processor. This processor includes new features designed to increase throughput compared with previous processor generations, including 14 nm process technology, 24 cores, and 33 MB L3 cache.

We ran several tests using industry-leading EDA single-threaded and multi-threaded EDA applications comprising Intel Xeon processor and chipset design workloads.

Our goal was to assess throughput improvement by measuring the time taken to complete a specific number of design workloads. To maximize throughput, we configured each application to utilize all available cores, resulting in one job or process per core. The test configuration is shown in Table 2. We then compared our results with previous tests conducted using the same approach on servers based on the processors.

Table 2. Test Configuration for Dual-Socket Servers

<table>
<thead>
<tr>
<th>Cores</th>
<th>Frequency</th>
<th>Cache</th>
<th>Interconnect</th>
<th>RAM</th>
<th>Memory Type</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>3.6 GHz</td>
<td>1 MB</td>
<td>800 MHz Shared FSB</td>
<td>16 GB</td>
<td>DDR2-400 MHz</td>
</tr>
<tr>
<td>2</td>
<td>3.0 GHz</td>
<td>4 MB</td>
<td>1333 MHz Dual Independent FSB</td>
<td>16 GB</td>
<td>DDR2-667 MHz</td>
</tr>
<tr>
<td>4</td>
<td>3.0 GHz</td>
<td>8 MB</td>
<td>1333 MHz Dual Independent FSB</td>
<td>32 GB</td>
<td>DDR2-667 MHz</td>
</tr>
<tr>
<td>4</td>
<td>3.16 GHz</td>
<td>12 MB</td>
<td>1333 MHz Dual Independent FSB</td>
<td>32 GB</td>
<td>DDR2-667 MHz</td>
</tr>
<tr>
<td>4</td>
<td>2.93 GHz</td>
<td>8 MB</td>
<td>25.6 GB/s per Intel® QPI link</td>
<td>48 GB</td>
<td>DDR3-1333 MHz</td>
</tr>
<tr>
<td>4</td>
<td>3.06 GHz</td>
<td>12 MB</td>
<td>25.6 GB/s per Intel® QPI link</td>
<td>48 GB</td>
<td>DDR3-1333 MHz</td>
</tr>
<tr>
<td>6</td>
<td>2.7 GHz</td>
<td>20 MB</td>
<td>32.0 GB/s per Intel® QPI link</td>
<td>96 GB</td>
<td>DDR3-1333 MHz</td>
</tr>
<tr>
<td>8</td>
<td>2.8 GHz</td>
<td>25 MB</td>
<td>32.0 GB/s per Intel® QPI link</td>
<td>96 GB</td>
<td>DDR3-1333 MHz</td>
</tr>
<tr>
<td>10</td>
<td>2.6 GHz</td>
<td>35 MB</td>
<td>38.4 GB/s per Intel® QPI link</td>
<td>128 GB</td>
<td>DDR3-1600 MHz</td>
</tr>
<tr>
<td>14</td>
<td>2.2 GHz</td>
<td>55 MB</td>
<td>38.4 GB/s per Intel® QPI link</td>
<td>256 GB</td>
<td>DDR4-2133 MHz</td>
</tr>
<tr>
<td>22</td>
<td>2.7 GHz</td>
<td>33 MB</td>
<td>41.6 GB/s per Intel® UPI link</td>
<td>256 GB</td>
<td>DDR4-2400 MHz</td>
</tr>
</tbody>
</table>

Maximizing Throughput with Intel® HT Technology

The Intel® Xeon® Platinum 8168 processor with Intel® Hyper-Threading Technology (Intel® HT Technology) enabled can support up to 96 concurrent software threads in a single two-socket platform and deliver higher performance throughput compared to HT Technology being disabled. Intel HT Technology increased performance by up to 1.25x when completing the same number of jobs using two times the application licenses.

Simulation Jobs Comparison

| Time Needed to Complete 113 Jobs on Intel® Xeon® Platinum 8168 Processor |
|-----------------------------|----------------------------|
| Intel® HT Technology ENABLED  | 0:44:27 1.25x INCREASED PERFORMANCE |
| Intel® HT Technology DISABLED | 0:55:26 |

DDR – double data rate; FB-DIMM – fully buffered dual in-line memory module; FSB – front side bus; Intel® QPI – Intel® QuickPath Interconnect; Intel® UPI – Intel® UltraPath Interconnect

7 DDR3-1333 RAM running at 1066 MHz. 8 DDR4-2133 RAM running at 1866 MHz.
Results

Results are shown in Figure 1; actual runtimes are on the following page in Table 3. The Intel Xeon Platinum 8168 processor-based server completed the tests up to 1.37x faster than a previous-generation Intel Xeon processor E5-2699 v4-based server. For historical purposes, we also show that the latest processor-based server is up to 24x faster than a server based on the Intel Xeon processor 5160 and up to 86x faster than a server based on a single-core 64-bit Intel Xeon processor.

Figure 1. Electronic Design Automation (EDA) summary test results showing relative throughput of 64-bit Intel® Xeon® processors.

Note: Same application binary used across all the platforms.

For more complete information about performance and benchmark results, visit intel.com/benchmarks. Performance results based on testing details and system configuration. See the full disclaimer and system configurations on page 6.
Conclusion

The new Intel Xeon processor Scalable family delivers significant improvements in throughput performance for Intel design workloads across a range of EDA applications in the data center.

Using a weighted performance measure of end-to-end EDA applications based on Intel silicon design tests, we found that the effective refresh ratio to replace servers based on the 8-core Intel Xeon processor E5-2600 series with servers based on the Intel Xeon processor Scalable family is approximately 3.2:1. Based on our performance assessment and our refresh cycle, we plan to deploy servers based on the new Intel Xeon processor Scalable family, which will enable us to achieve greater throughput while realizing operational benefits such as cost avoidance of data center construction and reduced power consumption.

Our test results suggest that other technical applications with large memory requirements — such as simulation and verification applications in the auto, aeronautical, oil and gas, and life sciences industries — could see similar throughput improvements, depending on their workload characteristics.

For more information on Intel IT best practices, visit [www.intel.com/IT](http://www.intel.com/IT).

### Table 3. Electronic Design Automation (EDA) Test Results Showing Runtimes and Workload Configurations

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>SIMULATION</strong> (113 CPU MODEL TESTS)</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Number of Simultaneous Jobs</td>
<td>2</td>
<td>4</td>
<td>8</td>
<td>8</td>
<td>8</td>
<td>12</td>
<td>16</td>
<td>28</td>
<td>44</td>
<td>48</td>
</tr>
<tr>
<td>Relative Throughput</td>
<td>1.00</td>
<td>3.58</td>
<td>5.65</td>
<td>5.91</td>
<td>12.98</td>
<td>18.63</td>
<td>25.87</td>
<td>32.01</td>
<td>46.28</td>
<td>63.14</td>
</tr>
<tr>
<td><strong>PHYSICAL VERIFICATION</strong> (DESIGN RULE CHECK [DRC])</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Simultaneous 2-Threaded Jobs</td>
<td>1</td>
<td>2</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>6</td>
<td>8</td>
<td>10</td>
<td>14</td>
<td>22</td>
</tr>
<tr>
<td>Total Number of Iterations</td>
<td>9240</td>
<td>4620</td>
<td>2310</td>
<td>2310</td>
<td>2310</td>
<td>1540</td>
<td>1155</td>
<td>924</td>
<td>660</td>
<td>420</td>
</tr>
<tr>
<td>Total Number of Jobs</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
</tr>
<tr>
<td>Total Runtime (hh:mm:ss)</td>
<td>60052:18:00</td>
<td>14151:19:00</td>
<td>7308:35:00</td>
<td>6443:37:00</td>
<td>6070:1000</td>
<td>4008:16:40</td>
<td>2900:58:30</td>
<td>2321:48:24</td>
<td>1538:21:00</td>
<td>1137:37:00</td>
</tr>
<tr>
<td>Relative Throughput</td>
<td>1.00</td>
<td>4.24</td>
<td>8.22</td>
<td>8.92</td>
<td>14.98</td>
<td>20.70</td>
<td>25.86</td>
<td>39.04</td>
<td>52.79</td>
<td>57.18</td>
</tr>
<tr>
<td><strong>PHYSICAL VERIFICATION</strong> (NODE ANTENNA CHECK [NAC])</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Simultaneous 2-Threaded Jobs</td>
<td>1</td>
<td>2</td>
<td>4</td>
<td>4</td>
<td>4</td>
<td>6</td>
<td>8</td>
<td>10</td>
<td>14</td>
<td>22</td>
</tr>
<tr>
<td>Total Number of Iterations</td>
<td>9240</td>
<td>4620</td>
<td>2310</td>
<td>2310</td>
<td>2310</td>
<td>2310</td>
<td>1540</td>
<td>1155</td>
<td>924</td>
<td>660</td>
</tr>
<tr>
<td>Total Number of Jobs</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
<td>9240</td>
</tr>
<tr>
<td>Total Runtime (hh:mm:ss)</td>
<td>16390:44:00</td>
<td>4500:39:00</td>
<td>2520:28:00</td>
<td>2186:09:30</td>
<td>1853:46:30</td>
<td>1302:09:20</td>
<td>833:31:30</td>
<td>668:33:48</td>
<td>448:26:00</td>
<td>286:39:00</td>
</tr>
<tr>
<td>Relative Throughput</td>
<td>1.00</td>
<td>3.64</td>
<td>6.50</td>
<td>7.50</td>
<td>8.84</td>
<td>12.59</td>
<td>19.66</td>
<td>24.59</td>
<td>36.55</td>
<td>57.18</td>
</tr>
</tbody>
</table>

IT@Intel

We connect IT professionals with their IT peers inside Intel. Our IT department solves some of today’s most demanding and complex technology issues, and we want to share these lessons directly with our fellow IT professionals in an open peer-to-peer forum.

Our goal is simple: improve efficiency throughout the organization and enhance the business value of IT investments.

Follow us and join the conversation:
- Twitter
- #IntelIT
- LinkedIn
- IT Center Community

Visit us today at [intel.com/IT](http://intel.com/IT) or contact your local Intel representative if you would like to learn more.

Related Content

If you liked this paper, you may also be interested in these related stories:
- Disaggregated Servers Drive Data Center Efficiency and Innovation paper
- Data Center Strategy Leading Intel's Business Transformation paper
- High-Performance Computing for Silicon Design paper
- Extremely Energy-Efficient, High-Density Data Centers paper
- How Software-Defined Infrastructure is Evolving at Intel paper
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to intel.com/benchmarks. Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as “Spectre” and “Meltdown”. Implementation of these updates may make these results inapplicable to your device or system.

The following system configurations and performance tests are discussed in this paper. For more information go to intel.com/performance.

Intel® Xeon® Platinum 8168 processor improves throughput up to 1.37x compared to a previous-generation Intel Xeon processor E5-2699 v4-based server. Intel Xeon Platinum 8168 Processor (24 cores, 2.7 GHz, 33 MB cache, 768 GB RAM, DDR4-2666 MHz) vs. Intel® Xeon® Processor E5-2699 v4 (2.2 GHz, 55 MB cache, 256 GB RAM, DDR4-2400 MHz).

Intel Xeon Platinum 8168 processor completed the workloads up to 86x faster than a server based on a 64-bit Intel Xeon processor. Intel Xeon Platinum 8168 Processor (24 cores, 2.7 GHz, 33 MB cache, 768 GB RAM, DDR4-2666 MHz) vs. 64-bit Intel® Xeon® Processor with 1 MB L2 cache (1 core, 3.6 GHz, 16 GB RAM, DDR2-400 MHz).

Intel Xeon Platinum 8168 processor-based server was up to 24x faster than a server based on the Intel Xeon processor 5160. Intel Xeon Platinum 8168 Processor (24 cores, 2.7 GHz, 33 MB cache, 768 GB RAM, DDR4-2666 MHz) vs. Intel® Xeon® Processor 5160 (2 cores, 3.0 GHz, 4 MB cache, 16 GB RAM, FB-DIMM/DDR2-667 MHz).

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com.

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families: Learn About Intel® Processor Numbers. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.

The information provided in this paper is intended to be general in nature and is not specific guidance. Recommendations (including potential cost savings) are based upon Intel's experience and are estimates only. Intel does not guarantee or warrant others will obtain similar results.

Information in this document is provided “as is” and does not mean to be a commitment on the part of Intel. Intel does not warrant the accuracy or completeness of the information herein and reserves the right to change the information or data described herein at any time without notice. Intel disclaims all express or implied warranties, including without limitation any express or implied warranty of merchantability, fitness for a particular purpose, non-infringement of intellectual property or any other common law or statutory rights or remedies.

Intel, the Intel logo, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.

* Other names and brands may be claimed as the property of others.