Management, search, query processing, and analysis of Big Data have become critical across the many stages of exploration, discovery, and refinement of innovations in computational science and engineering. Problems that can be represented as networks are particularly important since graph algorithms are a core part of many workloads in broad areas including social intelligence, bioinformatics, information networks, and machine learning. While demand for computational resources continues to grow with increasing need for querying and analyzing the volumes and rates of Big Data, the semiconductor industry has reached its physical scaling limits. Due to this dark silicon challenge, hardware acceleration through specialization has received renewed interest. Cray and IBM have begun incorporating specialized hardware accelerators, such as Field Programmable Gate Arrays (FPGAs), into their next-generation heterogeneous systems. Microsoft Catapult integrates reconfigurable FPGA fabric into data centers and improves performance by partially off-loading their workload to the collocated FPGAs. Quadro Plex consists of 16 nodes each with dual core CPUs, 4 GPUs and an FPGA. Axel system contains a collection of heterogeneous nodes, each composed of multi-core CPUs, GPUs, and FPGAs with a Map-Reduce framework.

Heterogeneous architectures are particularly appealing for high-performance, low-cost Big Data management, processing and analysis for two reasons. First, processing units optimized for fast sequential processing (i.e., CPU) and units optimized for massive parallelism (i.e., accelerators) coexist, providing the potential to cope with the heterogeneous structure of current and emerging workloads that require variable amounts of parallelism either across the execution phases of an algorithm or across the workload itself (e.g., scale-free graphs). Second, access to massively parallel hardware accelerators enables offloading compute-intensive kernels to hardware for significant performance improvements.

At the same time, the heterogeneity of processing elements introduces multiple challenges, i.e., accelerators implement a different parallel processing model than CPUs and have much less memory. Existing architectures cause accelerators to starve for data, as the path through CPU adds to the latency. However, emerging high bandwidth, low latency interconnect technologies provide coherent shared memory access between general purpose processors and accelerators. Data with a complex hierarchical structure results in the use of many pointers. The fine grained coherent access to memory makes data sharing among CPUs and accelerators trivial. In the case of separate memories, this would result in latency overhead due to sharing of pointers. Such technological advances provide abundant opportunities for the development of highly innovative techniques for extreme acceleration of core Big Data problems.

This project leverages emerging heterogeneous platforms to accelerate the querying and analysis of large structured and graph datasets for Big Data applications. Matching the characteristics and compute requirements of modern Big Data workloads and computation kernels is paramount into maximizing the utilization of heterogeneous compute platforms to facilitate the execution of data-intensive algorithms as a composition of tightly-coupled kernels, each designed to exploit the unique features of compute components and their associated memory access capabilities.

  • Publications
  1. Ajitesh Srivastava, Ren Chen, Charalampos Chelmis, and Viktor K. Prasanna, A Hybrid Design for High Performance Large-scale Sorting on FPGA, IEEE International Conference on ReConFigurable Computing and FPGAs (ReConFig '15), December 2015.
  2. Shijie Zhou, Charalampos Chelmis and Viktor K. Prasanna, Optimizing Memory Performance for FPGA Implementation of PageRank, IEEE International Conference on ReConFigurable Computing and FPGAs (ReConFig '15), December 2015.