Senior Embedded Software Engineer – Optimization of Computation of Audio Algorithm on GAP Architecture
Greenwaves Technologies is a 6-year-old fabless semiconductor startup established in Grenoble, France. Our first product GAP8 is the world’s first IoT Application Processor armed with 8+1 RISC-V based cores and a high performance HW convolution engine. It is a simple yet very sophisticated unique processor architecture, which delivers an energy efficiency that is 20x better than the state-of-the-art, opening a large range of battery powered applications. Examples of applications are people counting, keyword spotting, combined with beamforming, object recognition, face detection and vibration analysis. GAP8 is especially effective on machine learning inference algorithms (CNN, SVM, Bayesian, Boosting, Cepstral analysis). Yet, GAP8 is by and large programmed just like a regular MCU.
As a growing and highly multicultural team with sharp personalities, Greenwaves Technologies is very proud of its specific collaborative management style. The company is and will be what each of us make of it, as we experience every day, and we are looking for talented, enthusiastic, curious and committed people, who will be ready to bring their energy and skills for a significant contribution to the success of the company’s project.
As a member of the Audio team, you will contribute to port and optimize calculations and processing kernels onto the different chips of the GAP family. For this purpose you will benefit, and possibly improve, a number of tools and software that Greenwaves Technologies has developed: a GCC port for the extended cores embedded in GAP chips (32-bit RISC-V cores with DSP extensions); low-level routines for memory allocation, inter-core synchronizations and data movements across the memory hierarchy; an in-house tool, called Autotiler, which helps to efficiently parallelize and distribute an application graph, based on optimized basic kernels, onto the architecture, managing and optimizing resource allocation; a tool suite called NNtool which helps applications’ programmers tailoring a neural network to optimize its performance on GAP architecture. This set of tools is completed by a simulation platform which accurately simulates software execution on the chips, and equipped with a profiling tool which enables identification of performance bottlenecks in large applications.
Your tasks will include:
- Optimization of computation kernels, based on compromises between execution speed and numerical precision: data encoding (floating point, fixed point, number of bits), analysis of error propagation and impact, use of vector instructions, parallelism, etc.;
- Audio Algorithm adaptation to prepare optimized porting;
- Audio Algorithm parallel implementation, and use of HW accelerator;
- Proposal and implementation of improvements on the different algorithm & programming tools;
- Associated documentation.
- Good knowledge of audio processing and signal processing
- Good knowledge of computer arithmetic: fixed-point, floating, quantization, bounding of errors;
- Familiarity with signal processing (IIR/FIR, polynomial filters, Horner structure, Farrow structure…);
- Graph optimization and mapping techniques, multi-criteria optimization, constrained optimization;
- Code performance analysis and optimization on hardware target;
- Proficient in C/C++ programming;
- Familiarity with versioning/revision control systems.
- Good level of spoken and written English, to be used daily to communicate with colleagues and international partners;
- Organizational skills;
- Strong team spirit and communication abilities;
- Ability to work autonomously and proactively on assigned tasks.
- Knowledge of parallel architectures and DSP processors
- Application parallelization;
- Git proficiency;
- Knowledge of compiler intrinsics (front-end and back-end), especially GCC;
- Knowledge of AI applications and their port on constrained embedded architectures.
- Master Degree or plus in computer science or applied mathematics, with a solid background in computer arithmetics;
- A significant experience in optimizing computation on embedded processors is required, preferably using parallelism, vectorization, fixed point;
- An expertise in compilation is a plus.