High Performance Computing Trends: The Emergence of the Computational Scientist
Writing world class parallel scientific code requires a mix of interdisciplinary skills. These include domain expertise, applied math, computer science and computer engineering. It's not easy to find all of these in one individual.
In previous posts I’ve discussed both hardware and software trends in HPC. Today I will present a more human side of HPC and what may for symmetry be called peopleware. I’ve argued in a recent post that parallel coding is hard and is getting harder. Gone are the days when an organization can ask a domain specialist e.g. geoscientist, chemist or reservoir engineer, to write a modeling and simulation code and expect the result to have any appreciable performance. I like to do a thought experiment to illustrate my point. If in the 1980s, giant corporation X put its 100 best engineers in a room and asked them each to develop their own specialized science based or engineering based code the performance of the resulting examples would have a distribution but the variance would not be that significant. A few handy maxims about how to properly stride multi-dimensional arrays would likely fix most of the problems. If that experiment were done today in a world of C/C++, multi-level parallelism and accelerators the spread in application performance would be much greater. It not hard to imagine the best and worst code differing by a factor of 100x or more and the reasons go beyond array striding and into code architecture, parallel implementation, cache use and latency hiding. There are simply many more places to make performance and scaling mistakes today.
Writing world class parallel scientific code requires a mix of interdisciplinary skills. These include domain expertise, applied math, computer science and computer engineering. It's not easy to find all of these in one individual. If you do hire him or her immediately! The peopleware trend that I observe is the emergence of the computational scientist as a distinct profession. The characteristics of such people are an ability to rapidly assimilate into a new scientific or engineering domain, a firm grasp of applied math (e.g. the solution of PDEs, linear algebra), great programming skills and a fanatical devotion to performance through knowledge of underlying hardware.
I believe that large organizations have recognized this issue and they have dealt with it in one of several ways. First, only the largest of organizations continue to develop their own codes. In the energy industry, this elite group includes the supermajors and the large national oil companies. Smaller organizations have difficulty recruiting the type of individual described above as they may not even be fully aware of the required skills and they also provide a limited career path. Second, organizations are adopting a library approach where it makes sense. The idea here is to let experts develop high performance libraries that you can just plug into your existing codes. This can work quite well but in most cases only a portion of a full application can be turned into a library. If 50% of your code time is spent in a library function then the very best performance boost you can get would be 2x and this is the case when the library is sped up to infinity. A third approach that has been successfully adopted is a partnership between a company that has domain expertise, business critical problems and financial resources with a company that has HPC and computational expertise. Examples abound in the world of reservoir simulation (CMG – Shell/Petrobras), (Halliburton – BP), (Schlumberger- Total/Chevron) and our company, (Stone Ridge Technology – Marathon Oil). These partnerships can work very well with the computational company building a high performance engine or framework and the partner company bringing domain expertise, models for testing and direction for feature development. The final approach is purchasing off the shelf software, however many of these lag in performance because of their legacy origins as discussed in a previous post and perhaps more importantly don't provide advantage over competitors.
Great computational scientists can come from almost any discipline or background. Most are from physics, applied math, engineering or computer science. They often feel like the outsider in a group of domain specialists because they apply their skills to many different areas of science and engineering. In my own career as a physicist I started working in electronic structure but over time have contributed to work in molecular dynamics, surface catalysis, combustion simulation, fluid dynamics, financial engineering, bio-informatics, seismic processing and reservoir simulation. I believe the challenge of understanding the core principles behind new fields and applying one's knowledge of numerical modeling effectively to solve problems is one of the attractions of the profession. Looking towards the future I continue to see great opportunities for individuals that choose careers in computational science. The complexity of both hardware and software is increasing. Algorithm choice and implementation is also quite complex and specialized. These trends contribute to the challenge of developing robust, performant business critical applications and will swell demand for skilled individuals in computational science.
This post is the last of several based on themes I presented in a keynote talk delivered in mid-September at the 2nd annual EAGE workshop on High Performance Computing in the Upstream in Dubai.