High Performance Computing Trends: The Lagging Performance of Legacy Codes
The bottom line is that many of the codes we use today for business critical decisions were written for computers that no longer exist and there is a large and growing performance gap between what can be achieved and what isbeing achieved.
In this post I will discuss the second of two software trends I have observed in HPC, the large and growing performance gap for legacy science and engineering software. By performance gap I refer to the difference between the speed an application could achieve on a modern computing architecture and what it is achieving.
To understand why this gap exists and is widening we must go back to the halcyon days of the 1980s where many of today’s legacy codes originate. Developers in those days were primarily writing Fortran codes for serial processors focussing on compute bound bottlenecks. There were no multi-core chips and even multi-node computing through message passing protocols was not common yet. Over time new paths to increased performance were introduced through different types of parallelism. For example, in the 1990's MPI emerged for processing across nodes on large clusters. In the 2000's threading programming models like OpenMP were introduced to manage threads on a single node between and within processors. Also in the 2000’s first with MMX, then SSE and now AVX, the opportunity for SIMD parallelism was made available. At each step the original Fortran code was shoehorned into the new parallel world with functionality grafted on. It’s no surprise that performance is lagging and codes don’t scale well with increasing number of cores. The latter is as discussed in a previous post the principle way hardware developers are providing for increased performance for the foreseeable future.
This incremental approach has been pursued by many organizations because it appears to be the least risky. The thinking is that while you may not reap the full performance benefit of the new hardware generation at least you know the code will work and produce correct results. Over time however the integrated compromises of this low risk approach has produced very inefficient codes that significantly underperform. Many codes run well below the peak capability of the hardware and don't scale well past about four to six cores. The result is a large and growing cost. Organizations overprovision their hardware all the time to make up for inadequate software. Purchasing a $100M dollar cluster and using it like it were a $10M cluster leaves $90M on the table. The other cost which I believe is even more insidious is the cost in time. Apart from actual dollars, “engineer time” is probably an organization’s most valuable resource. The ability to generate a result in a few hours compared to a few days has real value in the optimal use of his or her acumen.
Businesses require information to operate efficiently and make informed decisions. Generating more information more quickly has inherent value. If I can run twenty times more model realizations with a modern optimized code, I can generate more data and lower the uncertainty of decisions that often involve the investment of 100’s of millions of dollars. Fast forward modeling also powerfully impacts optimization cycles like history matching where many many more iterations can be completed in the same time. Modern codes that scale also allow one to simulate on higher-fidelity models, ten to one hundred times larger than previously considered, capturing finer detail.
The bottom line is that many of the codes we use today for business critical decisions were written for computers that no longer exist and there is a large and growing performance gap between what can be achieved and what is being achieved. That gap translates to the inefficient use of resources and real dollar value lost. How big is the gap? I observe it to be between one and two orders of magnitude now and with the next generation of hardware I expect it to grow. Engineers, managers and technologists should consider what they could do if their simulation codes ran ten to one hundred times faster.
This post is one of several based on themes I presented in a keynote talk delivered in mid-September at the 2nd annual EAGE workshop on High Performance Computing in the Upstream in Dubai.