

Agile, eXtensible, fast I/O Module for the cyber-physical era

SmartCPS Concertation Event Brussels, 30<sup>th</sup> January 2017

#### **Brief introduction to the AXIOM project**

Paolo Gai, Evidence Srl, Italy pj@evidence.eu.com

















#### The AXIOM Project in one slide

 We are designing a small embedded board That bridges High Performance Computing (HPC) ... and Cyber Physical Systems (CPS)



 We connect a set of boards together using high-speed transceivers of Xilinx Zynq Ultrascale+ ... RDMA for fast transfers!



 We develop a common programming paradigm OmpSs@Cluster → OpenMP on the cluster on top of GASNet OmpSs@FPGA → Transparent FPGA acceleration



 We use it for Video and audio processing Smart surveillance, speech recognition











## **AXIOM Board:** characteristics

- Small form factor (160cm x 109cm)
- Xilinx Zynq Ultrascale+ ZU9EG
- socket SO-DIMM DDR4 for the PS RAM
- 1Gb DDR4 for the PL RAM
- 8 to 32 GB di eMMC
- Boot from QSPI, eMMC, uSD card, JTAG
- Standard connections (USB, Ethernet, Video output)
- Camera input
- Trace port for software tracing
- Power management measurement possible

### **AXIOM Board: AXIOM Link**

USB Type C connector

Used to get a high-speed connection between boards

Standard connector with special care for signal integrity



### Innovation Radar Finalists

Industrial & **Enabling Tech**  Excellent Science

ICT for Society Horizon 2020 ICT innovator





Cybertronica



Centre for Research & Technology Hellas



Brainstorm



Institute for Artificial 🔷





Marlo AS



IHP





Net7 SRL

mHealth

Technologies



Realeyes



**SECO** 

Avanzati

#InnovationRadar



# AXIOM @ Maker Faire 2016

Thousands of people passed at our booth

Demo Herta

**Demo Cluster Ompss** 

https://www.periscope.tv/w/1vOxwewoQm oGB

https://www.facebook.com/pg/theaxiompro
ject/videos/?ref=page\_internal

#### Easy programmability via OmpSs

#### Only 3 lines of code to

- accelerate code on FPGAs
- distributed code across several AXIOM boards

| Application | Seq - DMA<br>version | pthread<br>version | OmpSs<br>version |
|-------------|----------------------|--------------------|------------------|
| Cholesky    | 71                   | 26                 | 3                |
| Covariance  | 94                   | 29                 | 3                |
| 64x64       | 95                   | 39                 | 3                |
| 32x32       | 95                   | 39                 | 3                |

```
1#pragma omp target device(fpga, smp) copy_deps
2#pragma omp task in(a[0:64*64-1], b[0:64*64-1])
 4 void matrix_multiply(float a [64] [64].
                        float b[64][64],
                        float out [64] [64])
      for (int ia = 0; ia < 64; ++ia)
          for (int ib = 0; ib < 64; ++ib) {
              float sum = 0;
              for (int id = 0; id < 64; ++id)
                  sum += a[ia][id] * b[id][ib];
              out[ia][ib] = sum:
12
13
14
16 int main( void ){
18 matrix_multiply(A,B,C1);
19 matrix_multiply(A,B,C2);
20 matrix_multiply(C1,B,D);
  #pragma omp taskwait
```



- 1.7 Gflops using 64x64 blocking size
- 4.0 Gflops using 128x128 blocks



#### First complete software stack now available!

- QEMU Zynq Ultrascale+ Emulation
- AXIOM-Link software specs available
- Device drivers
- Memory allocator
- Utility apps
- GASNet Spawner
- OmpSs@Cluster





