MNist Multimodel example

In this example we show how to run multiple models on GAP at the same time.

The models are trained with training.py script (training.ipynb for the notebook version) and exported to ONNX format.

nntool_test.py is used to test the trained models with NNTool. This script quantizes the graphs and test the accuracy in a subset of the original testing dataset. It also contains automatic code generation for a template project that is run from NNTool’s Python APIs to collect on-device performance.

The same code generation procedure mentioned before has been used to generate the gap_project folder. That template has then been expanded to automate the Autotiler Model code generation (nntool_script.py and CMakeLists.txt), read images from files and check the predicted class (main.c).

How To run

A pretrained version of the models is already in the model folder. If you want to train them yourself, you can run the training.py script. The script will download the MNIST dataset and train the models. The models will be saved in the model folder.

To run the application on Gvsoc or Board:

cd gap_project
mkdir build && cd build && cmake ../
make run -j

The application generates the code for the 2 models (mnist_large and mnist_small) and runs them on the GAP multiple times alternately.

It supports both standard and reentrant mode.

Reentrant Mode

In the default (“reentrant”) mode, the application runs asyncronously the mnist_small model all the times (5 iterations). If some condition arises (in this case is a simple check on the iteration count), the small model is paused, the large model is executed to completion, and then the small one is resumed. This operation mode emulates the use case of a high priority task (large model) that needs to be run at a specific time with high priority.

The output should be something like this:

    *** NNTOOL mnistCNN Example ***

FC Frequency = 370000000 Hz CL Frequency = 370000000 Hz PERIPH Frequency = 370000000 Hz
Voltage: 800mV
Constructor of Large Model
Reading image from /home/marco/GWT/gap_sdk/examples/gap9/nn/nntool/multimodel/gap_project/test_00000_7.ppm
Image /home/marco/GWT/gap_sdk/examples/gap9/nn/nntool/multimodel/gap_project/test_00000_7.ppm:  [W: 28, H: 28] Bytes per pixel 1, HeaderSize: 13
Image /home/marco/GWT/gap_sdk/examples/gap9/nn/nntool/multimodel/gap_project/test_00000_7.ppm, [W: 28, H: 28], Bytes per pixel 1, Size: 784 bytes, Loaded successfully
pi_cl_l1_scratch_alloc 0 0 8192
Predicted class:    7
With confidence:    199 / 255

            S3__conv1_Conv_fusion_mnist_large: Cycles:        56386, Cyc%:   5.9%, Operations:      1254400, Op%:   2.1%, Operations/Cycle: 22.246656
            S6__conv2_Conv_fusion_mnist_large: Cycles:       263156, Cyc%:  27.4%, Operations:     14450688, Op%:  24.4%, Operations/Cycle: 54.913010
            S9__conv3_Conv_fusion_mnist_large: Cycles:       268607, Cyc%:  28.0%, Operations:     28901376, Op%:  48.8%, Operations/Cycle: 107.597252
        S12__conv4_Conv_fusion_mnist_large: Cycles:       347689, Cyc%:  36.2%, Operations:     14450688, Op%:  24.4%, Operations/Cycle: 41.562107
                    S15__fc7_Gemm_mnist_large: Cycles:        24528, Cyc%:   2.6%, Operations:       125440, Op%:   0.2%, Operations/Cycle: 5.114155
                                    IO_Wait: Cycles:            0, Cyc%:   0.0%, Operations:            0, Op%:   0.0%, Operations/Cycle: nan

                                        Total: Cycles:       960366, Cyc%: 100.0%, Operations:     59182592, Op%: 100.0%, Operations/Cycle: 61.625038

Destructing Large Model
Constructor of Small Model
Reading image from /home/marco/GWT/gap_sdk/examples/gap9/nn/nntool/multimodel/gap_project/test_00000_7.ppm
Image /home/marco/GWT/gap_sdk/examples/gap9/nn/nntool/multimodel/gap_project/test_00000_7.ppm:  [W: 28, H: 28] Bytes per pixel 1, HeaderSize: 13
Image /home/marco/GWT/gap_sdk/examples/gap9/nn/nntool/multimodel/gap_project/test_00000_7.ppm, [W: 28, H: 28], Bytes per pixel 1, Size: 784 bytes, Loaded successfully
pi_cl_l1_scratch_alloc 0 0 8192
Predicted class:    7
With confidence:    212 / 255

            S3__conv1_Conv_fusion_mnist_small: Cycles:        56110, Cyc%:  15.9%, Operations:      1254400, Op%:  15.3%, Operations/Cycle: 22.356087
        S6__conv2_0_Conv_fusion_mnist_small: Cycles:        51117, Cyc%:  14.5%, Operations:       112896, Op%:   1.4%, Operations/Cycle: 2.208580
        S9__conv2_1_Conv_fusion_mnist_small: Cycles:        36018, Cyc%:  10.2%, Operations:      1605632, Op%:  19.6%, Operations/Cycle: 44.578598
        S12__conv3_0_Conv_fusion_mnist_small: Cycles:        52503, Cyc%:  14.9%, Operations:       225792, Op%:   2.8%, Operations/Cycle: 4.300554
        S15__conv3_1_Conv_fusion_mnist_small: Cycles:        61503, Cyc%:  17.4%, Operations:      3211264, Op%:  39.2%, Operations/Cycle: 52.213127
        S18__conv4_0_Conv_fusion_mnist_small: Cycles:        31925, Cyc%:   9.0%, Operations:        56448, Op%:   0.7%, Operations/Cycle: 1.768144
        S21__conv4_1_Conv_fusion_mnist_small: Cycles:        39491, Cyc%:  11.2%, Operations:      1605632, Op%:  19.6%, Operations/Cycle: 40.658176
                    S24__fc7_Gemm_mnist_small: Cycles:        24428, Cyc%:   6.9%, Operations:       125440, Op%:   1.5%, Operations/Cycle: 5.135091
                                    IO_Wait: Cycles:            0, Cyc%:   0.0%, Operations:            0, Op%:   0.0%, Operations/Cycle: nan

                                        Total: Cycles:       353095, Cyc%: 100.0%, Operations:      8197504, Op%: 100.0%, Operations/Cycle: 23.216143
...

Standard Mode

In the non-reentrant case instead, the models are run both to completion. Here we want to show a simple app with multiple models. In particular the need of sharing the L1 TCDM buffer, hence the graph_warm_construct option used in the nntool script. The models are constructed without the L1 memory allocation which is handled in the application code directly using the largest of the 2 buffers.