TensorFlow to GAP8 Bridge
GAP8 TensorFlow to GAP8 Bridge Manual

Introduction

This manual describes the TF2GAP8 tool that we have developed for automatically translating TensorFlow CNN applications to source code for GAP8. This tool allows you to experiment with the whole work-flow of training your application with TensorFlow, visualizing the graph through TensorBoard, generating the corresponding GAP8 source code via TF2GAP8 flow and simulating this code on the GAP8 SDK.

The TensorFlow version used for this demonstration version of TF2GAP8 is the 1.4 branch for which documentation can be found here.

You should also refer to the GAP8 SDK documentation when reviewing this manual.

Preparing the TF2GAP8 environment

To install TF2GAP8, please execute the following commands:

1 cd ~/gap_sdk/tf2gap8
2 make install

The script will prompt for the root password.

The TF2GAP8 work-flow

TF2GAP8 is a tool that we have developed to allow you to be able to automatically generate GAP8 processor source code from a Convolutional Neural Network (CNN) application described using the TensorFlow r1.4 API.

Figure 1 shows the entire development work-flow for an application from its TensorFlow description through to its representation in GAP8 C/C++ source code using the GAP8 CNN and operators library.

TF2GAP8 Flow

The steps are as follows:

  • Training of the TensorFlow CNN application written with TF API and Python
  • Freezing the graph and weights of the TensorFlow application.
  • Using the TensorFlow Graph Transform Tool to transform the graph to prepare it for the inference phase on the GAP8 processor. The main tasks of this transformation will be to:
    1. Remove the parts of the graph that are not needed for inference
    2. Turn certain sub-expressions into single nodes
    3. Order nodes in processing order for inference
    4. Re-factor some nodes to fit the GAP8 CNN functions Library
  • Inference code generation from the application graph that has been previously transformed, using GAP8 auto-tiler CNN and operators generator library.

Training the application

For both sample applications we have included training data in the GAP SDK however you can choose to retrain the examples if you wish.

The application needs to be written in python, using the python TF API V1.2. As an example, please refer to the gap_sdk/tf2gap/examples/cifar10/cifar10.py or gap_sdk/tf2gap/examples/mnist/mnist.py files in the Virtual Machine.

When describing your application, TF2GAP8 supports the major TF API V1.2 Convolutional Neural Network functions:

  • Add
  • Conv2D
  • Softmax
  • Matmul
  • Reshape
  • Relu

During the training process, information about the TensorFlow graph needs to be generated to be used during the processing of the TF2GAP8 flow:

  1. A textual protobuf representation of the application graph (file extension .pbtxt)
    1. cifar10.pbtxt for the CIFAR10 example
    2. mnist.pbtxt for the MNIST example
  2. The last model simulation variables of the application (called a checkpoint). This checkpoint is composed of many files: checkpoint, model.ckpt.index, model.ckpt.meta, model.ckpt.data-00000-of-00001 for example

To generate the protobuf representation of a graph, you must add the following line after the training of your model:

1 tf.train.write_graph(<your session >.graph.as_graph_def(add_shapes=True),
2  '<your data directory>', '<your application name>.pbtxt', as_text=True)

For example, for the MNIST application, as we decided to store all data generated in the data directory and the name of our application is "mnist", we added the following line at the end of the MNIST application training:

1 tf.train.write_graph(session.graph.as_graph_def(add_shapes=True),
2  './data/', 'mnist.pbtxt', as_text=True)

To generate model variables (checkpoint), please refer to the TF Saving and Restoring documentation.

In the MNIST example, we saved the last step variables (weights and biases) of the trained model by using the following commands after the training loop:

First create the "saver" variable

1 saver = tf.train.Saver(sharded=True)

Then save the last step

1 saver.save(session, './data/model.ckpt')

You can start the training of the CIFAR10 example by running the following command in the gap_sdk/tf2gap8/examples/cifar10 directory:

1 $python3 cifar10.py

You can start the training in the MNIST example by running the following command in the gap_sdk/tf2gap8/examples/mnist directory:

1 $python3 mnist.py

Running TF2GAP8 flow

After the training of the application has been performed and the necessary graph protobuf file and last checkpoint data has been saved, the build process can be started using the command:

1 make clean all run

NOTE: In this release of the SDK it may be necessary to run this command twice due to a build issue. This will be corrected in a subsequent release.

The Makefile rules to run the different tools used to transform the TensorFlow graph, extract its structure and weights and generate GAP8 code are contained in the file gap_sdk/gen/tf2gap8.mk. This file defines a macro which is called from the main project Makefile. e.g.

1 $(eval $(call tf2gap8_rules,$(CURDIR)/data,mnist.pbtxt, \
2  model.ckpt,x_inter,y_output,$(CURDIR)/$(T2G_BUILD_DIR),mnist))

The parameters to this macro are as follows:

  • $1 = Absolute path to the directory with input TensorFlow graph
  • $2 = Input graph filename e.g. mnist.pbtxt
  • $3 = Input graph checkpoint filename prefix e.g. model.ckpt
  • $4 = Input node name e.g. x_inter
  • $5 = Output node name e.g. y_output
  • $6 = Absolute path to the build directory e.g. /tfbuild
  • $7 = Make phony target for generation e.g. mnist

Executing this macro will add a PHONY rule mnist which triggers the code generation process.

Note also that in the directory tf2gap8, you'll find the "tf2gap8.py" script that runs the tf2gap8 bridge flow with the adequate parameters. For examples, the shell scripts tf2gap8_mnist.sh and tf2gap8_cifar10.sh run the tf2gap8.py command respectively.

The process implemented by the macro is as follows:

Step 1: Freeze_graph Tool

Before running the Graph Transform Tool (GTT) on our graph, it needs to contain the weights and biases of the last training step as they are not stored inside the protobuf file. Instead, they're held in separate checkpoint files, and there are Variable ops in the graph that load the latest values when they are initialized.

The GTT requires the information to be contained in one file, so the freeze_graph.py script is used to take a graph definition and a set of checkpoints and freeze them together into a single file. What this does is to load the GraphDef, pull in the values for all the variables from the latest checkpoint file, and then replace each Variable op with a Const that has the numerical data for the weights stored in its attributes. It then strips away all the extraneous nodes that are not used for forward inference, and saves the resulting GraphDef into an output file.

The freeze_graph tool usually generates a file with a ".pb" suffix.

Running this step manually

Before using the freeze_graph tool, you need to build the package from the tensorflow directory (note that this step has already been done on the VM) and then run freeze_graph:

1 $bazel build tensorflow/python/tools:freeze_graph
2 $bazel-bin/tensorflow/python/tools/freeze_graph --input_graph=<graph>.pbtxt --input_checkpoint=<checkpoint>.ckpt --output_graph=<output graph>.pb --output_node_names=<output node>

For the cifar10 example in the VM, the following command must be executed from the tensorflow directory to freeze the graph stored in the ./TF2GAP8/examples/cifar10/data/cifar10.pbtxt file with the last checkpoint generated and stored in all the model.ckpt.* files:

1 $cd ~/tensorflow
2 $bazel-bin/tensorflow/python/tools/freeze_graph \
3 --input_graph=./tf2gap8/examples/cifar10/data/cifar10.pbtxt \
4 --input_checkpoint=./tf2gap8/examples/cifar10/data/model.ckpt \
5 --output_graph=./tf2gap8/examples/cifar10/data/cifar10_frozen.pb \
6 --output_node_names=prediction/y_output

For the mnist example in the VM, the following command must be executed from the tensorflow directory to freeze the graph stored in the ./TF2GAP8/examples/mnist/data/mnist.pbtxt file with the last checkpoint generated and stored in all the model.ckpt.* files:

1 $cd ~/tensorflow
2 $bazel-bin/tensorflow/python/tools/freeze_graph \
3 --input_graph=./tf2gap8/examples/mnist/data/mnist.pbtxt \
4 --input_checkpoint=./tf2gap8/examples/mnist/data/model.ckpt \
5 --output_graph=./tf2gap8/examples/mnist/data/mnist_frozen.pb \
6 --output_node_names=y_output

Step 2 : Graph Transform Tool (GTT)

The GTT is a TensorFlow tool that can modify a graph to better run in its final environment. As we are targeting to run the application on GAP8, we might want to shrink the file size by quantizing the weights, or optimize away batch normalization or other training-only features. The Graph Transform framework offers a suite of tools for modifying computational graphs, and a framework to make it easy to write your own modifications.

GTT GAP8 Specific Transformations

In this first version of the TF2GAP8 tool, GreenWaves Technologies (GWT) supports the following CNN operations of TensorFlow:

  • Add
  • Conv2D
  • Softmax
  • Matmul
  • Reshape
  • Relu

In order to match to the operators of the GAP8 CNN Library, we have added some node factorizations to GTT under different options of the command --transforms. During this processing, the following transformations can occur:

TensorFlow Operators Combination GAP8 Corresponding Operator
Conv2D + ADD + Relu + Maxpool GAP8_Conv2D
Conv2D + ADD + Maxpool GAP8_Conv2D
conv2D + ADD + RELU GAP8_Conv2D
conv2D + ADD GAP8_Conv2D
Reshape + Matmul + ADD + Relu + Softmax GAP8_DenseLayer
Reshape + Matmul + ADD + Softmax GAP8_DenseLayer
Reshape + Matmul + ADD GAP8_DenseLayer
DepthwiseConv2D + ADD GAP8_Depthwise_Conv2D

The options of the graph_transform tool corresponding to those fusions are the following ones:

  • fuse_conv2d_add_relu_maxpool
  • fuse_conv2d_add_maxpool
  • fuse_conv2d_add_relu
  • fuse_conv2D_add
  • fuse_reshape_matmul_add_relu_softmax
  • fuse_reshape_matmul_add_softmax
  • fuse_reshape_matmul_add
  • fuse_depthwiseconv2d_add

The TF2GAP8 bridge also applies the following GTT build-in transformations:

  • strip_unused_nodes
  • remove_nodes(op=Identity)

Running this step manually

The full command to run the GTT tool on the frozen graph is:

1 $~/tensorflow/bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
2 --in_graph= <frozen graph>.pb \
3 --out_graph=<optimized graph>.pb \
4 --inputs=<inputNode> \
5 --outputs=<outputNode> \
6 --transforms=-transforms=strip_unused_nodes remove_nodes(op=Identity) \
7  fuse_conv2d_add_relu_maxpool \
8  fuse_conv2d_add_relu \
9  fuse_conv2d_add_maxpool \
10  fuse_GAP8_conv2d_maxpool \
11  fuse_reshape_matmul_add_relu_softmax \
12  fuse_reshape_matmul_add_softmax \
13  fuse_reshape_matmul_add_relu \
14  fuse_reshape_matmul_add \
15  fuse_matmul_add_relu \
16  fuse_matmul_add \
17  fuse_depthwiseconv2d_add\

For the Cifar10 example, this command has been setup as:

1 $~/tensorflow/bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
2 --in_graph= ~/tensorflow/tf2gap8/examples/cifar10/data/cifar10_frozen.pb \
3 --out_graph= ~/tensorflow/tf2gap8/examples/cifar10/data/cifar10_optimized.pb \
4 --inputs= input/x_input \
5 --outputs= prediction/y_output \
6 --transforms=-transforms=strip_unused_nodes remove_nodes(op=Identity) \
7  fuse_conv2d_add_relu_maxpool \
8  fuse_conv2d_add_relu \
9  fuse_conv2d_add_maxpool \
10  fuse_GAP8_conv2d_maxpool \
11  fuse_reshape_matmul_add_relu_softmax \
12  fuse_reshape_matmul_add_softmax \
13  fuse_reshape_matmul_add_relu \
14  fuse_reshape_matmul_add \
15  fuse_matmul_add_relu \
16  fuse_matmul_add \
17  fuse_depthwiseconv2d_add\

For the Mnist example, this command has been setup as:

1 $~/tensorflow/bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
2 --in_graph= ~/tensorflow/tf2gap8/examples/mnist/data/mnist_frozen.pb \
3 --out_graph= ~/tensorflow/tf2gap8/examples/mnist/data/mnist_optimized.pb \
4 --inputs= x_inter \
5 --outputs= y_output \
6 --transforms=-transforms=strip_unused_nodes remove_nodes(op=Identity) \
7  fuse_conv2d_add_relu_maxpool \
8  fuse_conv2d_add_relu \
9  fuse_conv2d_add_maxpool \
10  fuse_GAP8_conv2d_maxpool \
11  fuse_reshape_matmul_add_relu_softmax \
12  fuse_reshape_matmul_add_softmax \
13  fuse_reshape_matmul_add_relu \
14  fuse_reshape_matmul_add \
15  fuse_matmul_add_relu \
16  fuse_matmul_add \
17  fuse_depthwiseconv2d_add\

Step 3 : GAP8 Source Code Generation

During this phase, the TF2GAP8 tool will read the optimized graph description resulting from the GTT tool transformations and generate the GAP8 source code to be simulated with the GAP8 SDK.

The result of the source code generation will be put in the ./tfbuild/GAP8Code directory within the example main directory. This will be :

The GAP8Code directory will contain the following files: Weights_bias.h : contains data structures for storing the weights and Biases Weights_bias.c : contains all the data structures for the weights and biases values of the graph. Network_process.c : contains the network_process() function, the main function of the inference process. Define.h : contains some constants definitions Param_layers.h : contains data structures storing the parameters of the main NN layers.

Running this step manually

To build this tool within the tensorflow directory, you must run the following command:

1 $bazel build tf2gap8:tf2gap8
  • Note that if you are using the TF2GAP8 VM delivered by GWT, you will not have to do this phase.

Run the following command to start TF2GAP8 source code generator phase:

1 $~/tensorflow/bazel-bin/tf2gap8/tf2gap8 \
2 <optimized graph path>.pb \
3 <optimize graph directory path> \
4 ~/tensorflow/tf2gap8

For the cifar10 example, enter the command:

1 $~/tensorflow/bazel-bin/tf2gap8/tf2gap8 \
2 ~/tensorflow/tf2gap8/examples/cifar10/data/cifar10_optimized.pb \
3 ~/tensorflow/tf2gap8/examples/cifar10/data \
4 ~/tensorflow/tf2gap8

And for the Mnist example, enter the command:

1 $~/tensorflow/bazel-bin/tf2gap8/tf2gap8 \
2 ~/tensorflow/tf2gap8/examples/mnist/data/mnist_optimized.pb \
3 ~/tensorflow/tf2gap8/examples/mnist/data \
4 ~/tensorflow/tf2gap8

Quantization

As GAP8 has limited internal memory (512K), it is very important to try to reduce the size of the inference application. As most memory is used for weights and biases the TF2GAP8 tool allows 16 bits quantization of the weights and biases during the GAP8 source code generation phase. In future versions of the TF2GAP8 tool we will provide more flexible options for quantization. Please do not hesitate to contact GreenWaves Technologies support if this is a limitation for you.

Simulation on GAP8 SDK

Once the GAP8 source code has been generated, it is run using the command:

1 make run

Examples

IN the VM We provide you with two examples applications that we have processed through the TF2GAP8 flow. They are located in the gap_sdk/tf2gap8/examples directory.

CIFAR10

The first one, CIFAR10 is an image classification application based on CIFAR10 reference data base of 60,000 images that belong in one of the ten image classes represented: Airplane, Automobile, bird, Cat, Deer, Dog, Frog, Horse, Ship, Truck.

All the images are 32 x 32 pixels large coded in RGB. This application CNN structure is represented below:

cifar10 dataflow

As it can be seen on the figure above, the network starts with a convolutional layer with a filter size of 5 and a number of input of 8, then is made a 2 by 2 max-pooling. Followed by another convolutional layer with this time 12 outputs, another 2 by 2 max-pooling and finally and fully connected layer to finish this network.

Note that the third dimension is not specified, actually CIFAR10 is in RGB so the input should be 3 x 32 x 32 and all the output would have an additional dimension of 3, but as our test camera only works in relative luminance level, all the RGB images have been converted to relative luminance images. Thus, the network can be seen as a 3 dimensional TensorFlow but is actually a 4 dimensional one.

It has been implemented in the three files:

  • cifar10.py : the main file with training and evaluation functions
  • cnn.py : creation of the different layers of the CNN
  • data.py : contains the functions for loading the training and test data.

Training

To retrain the network, run the following command:

1 python3 cifar10.py

For the training, if you didn't change the number of training sets in the cifar10.py file, you should obtain a 64% success rate. Please note that training will take some time.

CNN Graph Visualization

To visualize the graph after the training, run Tensorboard with the full path of the logs file generated during the training:

1 tensorboard --logdir=/home/gwt/gap_sdk/tf2gap8/examples/cifar10/logs

Then, in your browser, type the URL localhost:6006 and you will be able to see the graph of the cifar10 application.

Generate the code

To run TF2GAP8 and generate the GAP8 source code from the TF representation, enter:

1 make all

You will find the source code in the GAP8Code directory

Run the GAP8 simulation

To run the GAP8 simulation for the CIFAR10 example, please use the make run command.

1 make run

Results

For this inference run, we chose to test the recognition of the first image of the CIFAR10 dataset. Its feature type is 6 (a frog). It corresponds to the image stored in the cifr10/test_images/test_0_6.c file. By vizualizing the corresponding test_0_6.pgm file, you can see that it's a frog.

At the end of the inference run, you should see the following results. A score is displayed in front of each feature type of the CIFAR10 data set (car, truck, etc..). The feature type with the highest score determines the category of the test image. A frog (feature 6) has been successfully recognized.

1 ============> cycles 191138
2 
3  feat 0: 116
4  feat 1: 3655
5  feat 2: 8151
6  feat 3: 13142
7  feat 4: 5713
8  feat 5: 12971
9  feat 6: 20426
10  feat 7: 4801
11  feat 8: 896
12  feat 9: 0
13  found 6
14 Detected end of application, exiting with status: 0

Note that it could happen that one image is not correctly recognized as the success rate obtained after the training phase is only 64%.

By copying other .c files from the test-images directory to the cifar10/l2_x_cifar10.c directory, you can test other images. The "y" variable of a test_x_y.c image file represents the class number of the image (cifar10 dataset has 10 classes). The corresponding .pgm file represents the image in PGM format.

MNIST

MNIST is another reference test. This time, the goal is to recognize handwritten numbers. The MNIST network is similar to the CIFAR10 one.

First, there is a convolutional layer of 32 5 x 5 filters, then a 2 x 2 pooling layer. Then, the network expands itself with 64 5 x 5 filters on a second convolutional layer, a 2 x 2 max-pooling and a dense layer with 10 outputs for the prediction.

The example has been implemented in the following python files:

  • mnist.py: main training source code
  • cnn.py: source code for creating the different layers

Training

For training, run the following command:

1 python3 mnist.py

For the training, if you didn't change the number of training sets in the mnist.py file, you should obtain a 98% success rate.

CNN Graph Visualization

To visualize the graph after the training, run Tensorboard with the full path of the logs file generated during the training:

1 tensorboard --logdir=/home/gwt/gap_sdk/tf2gap8/examples/mnist/logs

This visualization will show the following application graph. Clicking on the nodes will give you detailed information of the node.

Mnist Graph

Run TF2GAP8

To run TF2GAP8 and generate the GAP8 source code from the TF representation, and run the simulation enter:

1 make clean all run

You can find the source code in the 'tfbuild/GAP8Code' directory.

Results

For this inference run, we chose to test the recognition of the first hand written digit of the MNIST data set that happens to be a 5

At the end of the inference run, you should see the following results. A score is displayed in front of each digit type of the MNIST data set (0, 1, 2, etc..). The digit type with the highest score determines the category of the test image. The digit "5" (score -1213) has been successfully recognized.

1  ============> cycles 1626122
2 
3  feat 0: -16003
4  feat 1: -15541
5  feat 2: -7852
6  feat 3: -7584
7  feat 4: -24881
8  feat 5: -9739
9  feat 6: -31061
10  feat 7: 7861
11  feat 8: -14895
12  feat 9: -8855
13  found (7)
14 Detected end of application, exiting with status: 0

Note that the success rate of the corresponding training being 98%, it could happen that some handwritten digits are not well recognized. This would be a normal result.