Bioinformatics Solutions

Stone Ridge Technology has worked with Bioinformatics algorithms for over three years on CPU, GPU and FPGA platforms. As the data produced by sequencing hardware has experienced a "Moore's Law" like growth for over a decade, the processing step has become increasingly challenging. The most popular approach to sequencing genetic data results in billions of short 25-45 base pair reads which need to be aligned to a reference genome. There are three broad classes of algorithms that accomplish this: Dynamic Programming (e.g. Smith-Waterman), hash-tables (e.g. BLAST, ZOOM) and suffix trees (e.g. Bowtie).  

Bioinformatics algorithms are a particularly good fit for reconfigurable computing on FPGA's because the data type is best represented in a non-standard 2 bit form and the operations are typically comparative in nature (i.e. no floating point required). Many of them can be cast into a streaming format which takes advantage of the FPGA's spatial and temporal parallelism.  Stone Ridge Technology has ported both the Smith Waterman and the BLAST algorithm to FPGA hardware. The details of these efforts can be found in papers in the HPC Toolshed. 

Our most recent work in this area was supported by an NSF grant which allowed us to study efficient FPGA implementations of Bioinformtics algorithms. Our principle finding in this work was that FPGAs are 5x less expensive in capital costs, 20x less expensive in performance per watt and 15x denser in footprint. The work accomplished for NSF was completed using our RDX-11 hardware. 

 

FPGA PERFORMANCE

Compared to CPU performance, Stone Ridge Technology's FPGA-based solutions to bioinformatics problems can be up to:

  • 5X less expensive
  • 20X more power-efficient
  • 15X denser than CPU solutions

 

Smith-Waterman

Smith-Waterman solves the problem of finding the locally optimal match of a small sequence to a larger reference. It can be formulated to allow gaps and extensions and is a classic example of a dynamic programming. In many ways Smith Waterman is a perfect match for reconfigurable hardware and many groups have implemented it in hardware with excellent results. SRT has implemented Smith Waterman in an Altera in-socket FPGA module and in a Xilinx chip on our own RDX-11 hardware. Our implementation on the RDX-11 with an LX50 chip achieved 42 GCups (giga cell updates per second).

BLAST

The Basic Local Alignment Search Tool (BLAST) is a popular hash table-based algorithm which is significantly faster than Smith-Waterman while sacrificing some accuracy. It uses hash tables to identify likely areas of the database for optimal local alignment where it focuses more expensive dynamic programming algorithms. A classic hash table approach, BLAST has been largely superseded by newer approaches that use non-contiguous and sliding seeds to improve hit probability and sensitivity. SRT has implemented BLAST in an Altera in-socket FPGA module. Project results were compiled into a white paper that is available in the HPC Toolshed.