GAP8 Software Development Kit
 All Files Pages
GAP8 Manual

Introduction of the GAP8 IoT application processor

GAP8 is a RISC-V and PULP (Parallel Ultra-Low-Power Processing Platform) open-source platform based IoT application processor. It enables cost-effective development, deployment and autonomous operation of intelligent devices that capture, analyze, classify and act on the fusion of rich data sources such as images, sounds or vibrations. In particular, GAP8 is uniquely optimized to execute a large spectrum of image and audio algorithms including convolutional neural network inference with extreme energy efficiency. This allows industrial and consumer product manufacturers to integrate artificial intelligence and advanced classification into new classes of wireless edge devices for IoT applications including image recognition, counting people and objects, machine health monitoring, home security, speech recognition, consumer robotics and smart toys.

GAP8 Micro-architecture

GAP8's hierarchical, demand-driven architecture enables ultra-low-power operation by combining:

  • A series of highly autonomous smart I/O peripherals for connection to cameras, microphones and other capture and control devices.
  • A fabric controller core for control, communications and security functions.
  • A cluster of 8 cores with an architecture optimized for the execution of vectorized and parallelized algorithms combined with a specialized Convolutional Neutral Network accelerator (HWCE).

All cores and peripherals are power switchable and voltage and frequency adjustable on demand. DC/DC regulators and clock generators with ultra fast reconfiguration times are integrated. This allows GAP8 to adapt extremely quickly to the processing/energy requirements of a running application. All elements share access to a L2 memory area. The cluster cores and HWCE share access to a L1 memory area and instruction cache. Multiple DMA units allow autonomous, fast, low power transfers between memory areas. A memory protection unit is included to allow secured execution of applications on GAP8.

All 9 cores share the same extended RISC-V instruction set architecture. The I (integer), C (compressed instruction), M (Multiplication and division) extensions and a portion of the supervisor ISA subsets are supported. These are extended with specific instructions to optimize the algorithms that GAP8 is targeted at. These extensions include zero overhead hardware loops, pointer post/pre modified memory accesses, instructions mixing control flow with computation (min, max, etc), multiply/subtract and accumulate, vector operations, fixed point operations, bit manipulation and dot product. All of these instruction extensions are optimized by the compiler or can be used 'by hand'.

GAP8 Features

  • 1+8 high performance extended RISC-V ISA based cores.
    • 1 - A high performance micro-controller
    • 8 - 8 cores that execute in parallel for compute intensive tasks
  • A hardware Convolution Engine (HWCE) for Convolutional Neural Networks based applications.
  • A level 2 Memory (512KB) for all the cores
  • A level 1 Memory (64 KB) shared by all the cores in Cluster
  • A level 1 memory (8 KB) owned by FC
  • A smart, lightweight and completely autonomous DMA (micro-DMA) capable of handling complex I/O schemes.
  • A multi-channel 1D/2D cluster-DMA controls the transactions between the L2 Memory and L1 Memory.
  • A rich set of peripheral interface
  • 2 programmable clocks
  • Memory Protection Unit
GAP8Layout.png
GAP8 Layout

As shown in the GAP8 block diagram above, GAP8 has a rich set of I/O interfaces, which includes:

Interface Number Description
LVDS 1 A 128 Mb/s interface for RF
ORCA 1 A low debit interface for RF.
I2C 2 Standard I2C interfaces.
I2S 2 Standard I2S interface for connecting digital audio devices.
CPI 1 A parallel interface for connecting camera
HyperBus 1 A high speed memory bus interface
SPI-M 2 A quad SPI-M and an additional SPI-M
SPI-S 1 A SPI Slave
UART 1 A standard UART interface
GPIOs 32General Purpose Input Output

Other peripherals:

Peripherals Number Description
RTC 1 32K real-time clock
PWM 4 PWMs, 12 Output channels

The Fabric Controller and The Cluster

As mentioned in previous section, GAP8 has 8+1 high performance cores, which play two different roles. The cluster contains 8 cores that can execute in parallel, and provide high performance calculation for image processing, audio processing or signal modulation, etc. The single core, referred to as the “Fabric Controller” or "FC", is used as micro-controller. It is in charge of controlling all the operations of GAP8, like the micro-DMA to capture 1 image from the CPI interface, starting up the cluster and dispatching a job to it, etc. You can think of the cluster as a 'peripheral' of the FC.

Cores are identified using two identification numbers. The cluster ID which identifies the group of cores that the core belongs to and the core ID which identifies the core in that group. The IDs can be used to start a particular task on a core.

ENTITY CLUSTER ID CORE ID
CORE0 0x00 0x00
CORE1 0x00 0x01
CORE2 0x00 0x02
CORE3 0x00 0x03
CORE4 0x00 0x04
CORE5 0x00 0x05
CORE6 0x00 0x06
CORE7 0x00 0x07
FC 0x20 0x00

The table above shows:

  • Core 0-7: which have the same cluster ID, but have different core ID.
  • FC: which has the same core ID as the cluster core 0, but its cluster ID is equal to 0x20.

By default, the cluster is powered down and cannot be used. It must first be powered-up by the FC. Once the cluster is awake, its core 0 plays the role of "master". Core 0 is in charge of the following jobs:

  • The communication with the Fabric Controller. For example, getting a task from the FC, sending a data request to the FC, getting synchronized with the FC, etc.
  • Dispatching tasks/applications to other cores.

Before receiving a task from the core 0, the rest of the cores stay at a dispatch barrier which clock gates them (i.e. they are stopped and use only a low leakage current).

When a task/application is finished on the cluster, it should be shut down to save power. IT IS IMPORTANT TO MOVE DATA IN THE SHARED L1 MEMORY TO THE L2 MEMORY BEFORE YOU SHUTDOWN THE CLUSTER.

Memory areas

There are 2 different levels of memory internal to GAP8. A larger level 2 area of 512KB which is accessible by all processors and DMA units and two smaller level 1 areas, one for the FC (16KB) and one shared by all the cluster cores (64KB). The shared level 1 memory is banked and cluster cores can usually access their bank in a single cycle. GAP8 can also access external memory areas over the HyperBus (Flash or RAM) or quad-SPI (Flash) peripherals. We refer to RAM accessed over the HyperBus interface as level 3 memory. Since the energy cost and performance cost of accessing external RAM over the HyperBus is very high compared to the internal memory generally this should be avoided as much as possible. Code is generally located in the L2 memory area. The instruction caches of the FC (4KB) and cluster (16KB) will automatically cache instructions as needed. The cluster instruction cache is shared between all the cores in the cluster. Generally the cluster cores will be executing the same area of code on different data so the shared cluster instruction cache exploits this to reduce memory accesses for loading instructions.

Micro-DMA and cluster-DMA

To reduce power consumption GAP8 does not include data caches in its memory hierarchy. Instead GAP8 uses autonomous DMA units that can be used to transfer data to and from peripherals and in between internal memory areas.

Good management of memory is absolutely crucial to extracting the most energy efficiency from GAP8. GreenWaves supplies a tool, the GAP8 auto-tiler, which can significantly aid in managing memory transfers between the different memory areas.

The micro-DMA unit is used to transfer data to and from peripherals including level 3 memory. At the end of a transaction the FC can be woken up to queue a new task. To allow the micro-DMA to continue working at the end of a transaction up to 2 transfers can be queued for each peripheral. The micro-DMA schedules active transfers based on signals from the peripherals in a round-robin fashion. Generally the micro-DMA is not used directly by a programmer. It is used by the drivers for each of the peripherals.

As the micro-DMA, the cluster-DMA is a smart, lightweight and completely autonomous unit. It is used to transfer data between the L2 and L1 memory areas. It supports both 1D and 2D transfers and can queue up to 16 requests. The commands for the cluster-DMA unit are extremely short which minimizes SW overhead and avoids instruction cache pollution. \newpage

Introduction to the GAP SDK

The GAP8 SDK allows you to compile and execute applications on the GAP8 IoT Application Processor. This SDK is an extract of the necessary elements from the pulp-sdk (https://github.com/pulp-platform/pulp-sdk) produced by the PULP project, to provide a development environment for the GAP8 series processors.

We provide you with a set of tool and two different operating systems for GAP8:

  • Tools
    • GAP8 RISCV GNU toolchain: a pre-compiled toolchain inherited from RISC V project with support for our extensions to the RISC-V Instruction Set Architecture.
      • Program / control GAP8
      • Debug your application using GDB
      • Program the GAPuino flash memory with applications
    • NNTOOL: a set of tool based on python helps to port NN graphs from various NN training packages to GAP8
    • Autotiler: a code generator for GAP8, which can generate a user algorithm (CNN, MatrixAdd, MatrixMult, FFT, MFCC, etc) with optimized memory management.
    • gapy: a set of tool based on python for building the flashimage, creating partitions, creating FS, executing the openOCD, etc.
  • Operating Systems
    • PULP OS - The open source embedded RTOS produced by the PULP project
    • FreeRTOS - FreeRTOS is an open source real time operating system. GreenWaves Technologies has ported it to GAP8
    • PMSIS - PMSIS is an open-source system layer which any operating system can implement to provide a common API to applications. We currently provide it for PULP OS and FreeRTOS, and it is used by our applications to be portable.

\newpage

Installing the GAP SDK

Ubuntu 18.04

These instructions were developed using a fresh Ubuntu 18.04 Bionic Beaver 64-Bit virtual machine from https://www.osboxes.org/ubuntu/#ubuntu-1804-info

The following packages needed to be installed:

sudo apt-get install -y build-essential git libftdi-dev libftdi1 doxygen python3-pip libsdl2-dev curl cmake libusb-1.0-0-dev scons gtkwave libsndfile1-dev rsync autoconf automake texinfo libtool pkg-config libsdl2-ttf-dev

An openocd build for gap8 should be cloned and installed:

git clone https://github.com/GreenWaves-Technologies/gap8_openocd.git
cd gap8_openocd
./bootstrap
./configure --program-prefix=gap8- --prefix=/usr --datarootdir=/usr/share/gap8-openocd
make -j
sudo make -j install
#Finally, copy openocd udev rules and reload udev rules
sudo cp /usr/share/gap8-openocd/openocd/contrib/60-openocd.rules /etc/udev/rules.d
sudo udevadm control --reload-rules && sudo udevadm trigger

Now, add your user to dialout group.

sudo usermod -a -G dialout <username>
# This will require a logout / login to take effect

Finally, logout of your session and log back in.

If you are using a Virtual Machine make sure that you give control of the FTDI device to your virtual machine. Plug the GAPuino into your USB port and then allow the virtual machine to access it. For example, for VirtualBox go to Devices->USB and select the device.

Please also make sure that your Virtual Machine USB emulation matches your PC USB version. A mismatch causes the USB interface to be very slow.

The following instructions assume that you install the GAP SDK into your home directory. If you want to put it somewhere else then please modify them accordingly.

Ubuntu 16.04

You can follow the steps for Ubuntu 18.04 except for the following instructions.

After you have installed the system packages with apt-get, you need to also create this symbolic link:

sudo ln -s /usr/bin/libftdi-config /usr/bin/libftdi1-config

Also, you may need to install git lfs

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
git lfs install

Download and install the toolchain:

Now clone the GAP8 SDK and the GAP8/RISC-V toolchain:

git clone https://github.com/GreenWaves-Technologies/gap_riscv_toolchain_ubuntu_18.git

In case you use an old git version, you may need to use these commands instead:

git lfs clone https://github.com/GreenWaves-Technologies/gap_riscv_toolchain_ubuntu_18.git

Install the toolchain (this may require to launch the script through sudo):

cd ~/gap_riscv_toolchain_ubuntu_18
./install.sh

Finally, clone the sdk (adapt gap_sdk path according to your needs)

git clone https://github.com/GreenWaves-Technologies/gap_sdk.git
cd ~/gap_sdk

Configure the SDK:

You can either source sourceme.sh in the root sdk folder and then select the right board from the list, or directly source the board config.

source sourceme.sh

or

# replace gapuino_v2.sh by the board you want
source config/gapuino_v2.sh

If you directly source the board config, you need to source the appropriate config file for the board that you have. The SDK supports 2 boards (gapuino and gapoc) and each of them can use version 1 or version 2 of the GAP8 chip. Boards bought before 10/2019 contains GAP8 version 1 and use a USB B plug for JTAG while the ones bought after contains version 2 and use a USB micro B for JTAG.

Hereafter you can find a summary of the available boards and their configuration file.

Board Chip Config file
Gapuino GAP8 v1 configs/gapuino.sh
Gapuino GAP8 v2 configs/gapuino_v2.sh
Gapoc GAP8 v1 configs/gapoc_a.sh
Gapoc GAP8 v2 configs/gapoc_a_v2.sh

Once the proper config file is sourced, you can proceed with the SDK build.

Note that after the SDK has been built, you can source another board config file to change the board configuration, in case you want to use a different board. In this case the SDK will have to be built again. As soon as the SDK has been built once for a board configuration, it does not need to be built again for this configuration, unless the SDK is cleaned.

Minimal install (FreeRTOS only, no neural network tools)

We will first make a minimal install to check whether previous steps were successful. If you are only doing board bringup or peripheral testing, this install will also be a sufficient.

Python requirements

Our modules (gapy runner) require a few additional Python packages that you can install with this command from Gap SDK root folder:

pip3 install -r requirements.txt

SDK install

Initialize and download all sub projects required to run pmsis_examples on a board (freertos, pmsis_api, gapy and bsp):

First, use the following command to configure the shell environment correctly for the GAP SDK. It must be done for each terminal session**:

cd ~/gap_sdk
# Choose which board
source sourceme.sh

Tip: You can add an "alias" command as follows in your .bashrc file:

alias GAP_SDK='cd ~/gap_sdk && source sourceme.sh'

Typing GAP_SDK will now change to the gap_sdk directory and execute the source command.

Then, compile the minimal set of dependencies to run examples:

make minimal_sdk

Helloworld

Finally try a test project. First connect your GAPuino to your PCs USB port. Now, you should be able to run your first helloworld on the board.

cd examples/pmsis/helloworld
make clean && make PMSIS_OS=freertos platform=board io=host all -j && make platform=board io=host run

In details: PMSIS_OS allows us to choose an OS (freertos/pulpos), platform allows to choose the runner (board/gvsoc) and io choose the default output for printf (host/uart).

After the build you should see an output resembling:

*** PMSIS HelloWorld ***
Entering main controller
[32 0] Hello World!
Cluster master core entry
[0 7] Hello World!
[0 0] Hello World!
[0 4] Hello World!
[0 5] Hello World!
[0 3] Hello World!
[0 1] Hello World!
[0 2] Hello World!
[0 6] Hello World!
Cluster master core exit
Test success !
Detected end of application, exiting with status: 0
Loop exited
commands completed

If this fails, ensure that you followed previous steps correctly (openocd install, udev rules). If libusb fails with a permission error, you might need to reboot to apply all changes.

If you need Gap tools for neural networks (nntool) or the Autotiler, please follow the next section

If you just wish to also have access to pulp-os simply type:

# checkout all needed submodules
make pulpos.all
# compile pulp-os and its librairies
make pulp-os

And replace PMSIS_OS=freertos by PMSIS_OS=pulpos on your run command line.

Full Install

Python requirements

In order to use the Gap tools for neural networks (nntool), we strongly encourage to install the Anaconda distribution ( Python3 ). You can find more information here: https://www.anaconda.com/.

Note that this is needed only if you want to use nntool, you can skip this step otherwise. Once Anaconda is installed, you need to activate it and install python modules for this tool with this command:

pip install -r tools/nntool/requirements.txt
pip install -r requirements.txt

Pull and compile the full tool suite

Finally, we install the full tool suite of the sdk (including nntool and autotiler).

git submodule update --init --recursive
make sdk

Note that if you only need autotiler (and not nntool) you can instead use:

git submodule update --init --recursive
make all && make autotiler

OpenOCD

OpenOCD for Gap8 is now used instead of plpbridge. There are a few applications which require OpenOCD, as they are using OpenOCD semi-hosting to transfer files with the workstation.

You have to install the system dependencies required by OpenOCD that you can find here: http://openocd.org/doc-release/README

There are different cables setup by default for each board. In case you want to use a different cable, you can define this environment variable:

export GAPY_OPENOCD_CABLE=interface/ftdi/olimex-arm-usb-ocd-h.cfg

Using the virtual platform

If you only followed Minimal installation process, begin by compiling gvsoc:

make gvsoc

You can also run this example on the Gap virtual platform with this command:

make clean all run platform=gvsoc PMSIS_OS=freertos/pulpos

You can also generate VCD traces to see more details about the execution:

make clean all run platform=gvsoc runner_args=--vcd

You should see a message from the platform telling how to open the profiler.

Using the flasher (Hyperflash)

As soon as at least one file for a file-system is specified, the command "make all" will also build a flash image containing the file systems and upload it to the flash.

For example, you can include files for the readfs file-system with these flags in your Makefile:

READFS_FILES += <file1> <file2> <file3> ......

In case you don't have any file but you still want to upload the flash image, for example for booting from flash, you can execute after you compiled your application:

make flash

In case you specified files, the command "make all" will not only build the application but also build the flash image and upload it to the flash. In case you just want to build your application, you can do:

make build

Then after that if you want to produce the flash image and upload it, you can do:

make image flash

Boot from flash

The board is by default configured to boot through JTAG. If you want to boot from flash, you need to first program a few efuses to tell the ROM to boot from flash. Be careful that this is a permanent operation, even though it will still be possible to boot from JTAG. This will just always boot from flash when you power-up the board or reset it. To program the efuses, execute the following command and follow the instructions:

# if using hyperflash:
openocd-fuser-hyperflash
# if using spiflash:
openocd-fuser-spiflash

If you choose to boot your application from Flash, and you want to view the output of printf's in your code then you can first compile your application with the printf redirected on the UART with this command:

make clean all platform=board PMSIS_OS=your_os io=uart

You can also use a terminal program, like "cutecom":

sudo apt-get install -y cutecom
cutecom&

Then please configure your terminal program to use /dev/ttyUSB1 with a 115200 baud rate, 8 data bits and 1 stop bit.

Documentation

Build the documentation:

cd gap_sdk
make docs

If you haven't download and install the autotiler, you will probably have some warnings when you build the docs. All the documentations are available on our website: https://greenwaves-technologies.com/en/sdk/

You can read the documentation by opening gap_doc.html in the docs folder in your browser:

firefox docs/gap_doc.html

If you would like PDF versions of the reference manuals you can do:

cd docs
make pdf

Upgrading/Downgrading the SDK

If you want to upgrade/downgrade your SDK to a new/old version:

cd gap_sdk
git checkout master && git pull
git checkout <release tag name>
git submodule sync --recursive
# For minimal install
make clean minimal_sdk
# for full install
git submodule update --init --recursive
make clean sdk

Please check our release tags here to ensure the version: https://github.com/GreenWaves-Technologies/gap_sdk/releases

What is in the gap8_sdk folder?

This folder contains all the files of the GAP8 SDK, the following table illustrate all the key files and folders:

Name Descriptions
docs Runtime API, auto-tiler and example application documentation
pulp-os a simple, PULP Runtime based, open source operating system for GAP8.
sourceme.sh A script for configuring the GAP SDK environment
examples Examples of runtime API usage
tools All the tools necessary for supporting the GAP8 usage