How will my code perform on GPUs?

Although a firm answer is difficult without actually doing the GPU implementation itself or at least seeing the code, there are some rules of thumb that may give you a rough idea.  If your problem is compute bound, meaning that most of its time on the CPU is spent doing computation and not waiting on memory then a back of the envelope calculation shows that you can expect at best about 20x improvement on the GPU. If the CPU code is highly optimized and effectively using SSE instructions, reduce the expected improvement to 5x. 

If your problem is memory bound, meaning that most of the time is spent waiting for memory accesses, then performance will be at least as good as the ratio of main memory bandwidth which is about 4x and probably much better. We say probably much better because the GPU is able to hide memory latencies very effectively by swapping out threads that are waiting on memory and computing on threads that have their data ready. For these problems its important to provide the GPU with a large thread pool that it can manage. Call or email us and let our experts give you a free evaluation of your code.