Interest in visual computing or GPU computing has grown significantly in the last 3 years since NVIDIA released their CUDA development environment and their Tesla hardware along with AMD's Firestream products became widely available. Evaluation of the technology for pilot projects has proceeded to production deployment in a number of industries including Energy, Finance and Medical Imaging. Adoption has proceeded relatively rapidly because i) the technology works ii) it's affordable and iii) its easy to get started.
Stone Ridge Technology has been one of NVIDIA's key US based partners since 2008. We've ported and developed tens of thousands of lines of code to NVIDIA hardware providing significant performance acceleration to our clients. Some of the applications we've worked on include Reverse Time Migration (RTM), Kirchhoff Time Migration (KTM), Sparse Linear Algebra (SLA), Computational Electromagnetics and Options Valuation. Our clients include the oil and gas majors as well as the large service companies. For more information please give us a call. Our code evaluation services are free of charge. Also visit our toolshed where you will find more interesting resources related to GPU computing.
How will my code perform:
Although a firm answer is difficult without actually doing the GPU implementation itself, there are some rules of thumb that may give you a rough idea. The NVIDIA Tesla has 240 cores running at 1GHz, while the Intel Nehalem has 4 cores running at about 3 GHz. If your problem is compute bound, meaning that most of its time on the CPU is spent doing computation and not waiting on memory then a back of the envelope calculation shows that you can expect at best about 20x improvement on the GPU. If the CPU code is highly optimized and effectively using SSE instructions, reduce the expected improvement to 5x. If your problem is memory bound, meaning that most of the time is spent waiting for memory accesses, then performance will be at least as good as the ratio of main memory bandwidth which is about 4x and probably much better. We say probably much better because the GPU is able to hide memory latencies very effectively by swapping out threads that are waiting on memory and computing on threads that have their data ready. For these problems its important to provide the GPU with a large thread pool that it can manage. Call or email us and let our experts give you a free evaluation of your code.




