June 17-21, 2012

Hamburg, Germany

Contribution Details

Name: Architectures
(1) BLAS Performance with Low Power Embedded Multi-Core Digital Signal Processor from Texas Instruments
Time: Monday, June 18, 2012
3:00 PM - 8:30 PM
Room:   Hall H, #911
CCH - Congress Center Hamburg
Speakers:   Murtaza Ali, Texas Instruments
Abstract:   Recently several heterogeneous systems have gained attractions of the high performance compute (HPC) community due to their promise of providing lower power compared to traditional CPU based approaches. In this poster, we present results of some dense linear algebra operations on the multi-core digital signal processors (DSP) known by its part number TMS320C6678. DSPs are widely used in various embedded systems including wireless base-station, industrial and medical imaging, test and measurement systems etc. and are designed to maintain low power. As the compute need of these embedded systems have increased, DSPs have responded by providing multi-core devices just like their CPU counterparts. In addition, DSPs have also added floating point capabilities without additional power penalty. The result is low power embedded processor capable of handling HPC needs.

TMS320C6678 is an 8 core device with each core running at 1 GHz. This device only dissipates 10 w of power and provides a peak performance of 128 single precision GFLOPS and 48 double precision GFLOPS. Peak performance does not always guarantee good performance for various HPC applications. We have implemented and analyzed the performance of various BLAS level 3 functions on this multi-core device. We are currently achieving 8 GFLOPS per watt for single precision generalized multiplication (sgemm) and 2.2 GFLOPS per watt for its double precision equivalent (dgemm). The rest of the Level 3 BLAS performances are within 10% of their gemm counterparts. To our knowledge, this is the best performance per power for such operations for currently available compute acceleration platforms making multi-core DSPs an attractive alternative for power efficient Heterogeneous systems.
