Cluster

FC/cluster synchronization

group FcClusterSync

This set of functions provide support for controlling clusters from fabric-controller side.

Unnamed Group

void pi_cluster_conf_init(struct pi_cluster_conf *conf)

Initialize a cluster configuration with default values.

This function can be called to get default values for all parameters before setting some of them. The structure containing the configuration must be kept alive until the SPI device is opened.

Parameters:

conf – A pointer to the SPI master configuration.

int32_t pi_cluster_open(pi_device_t *device)

Open and power-up the cluster.

This function must be called before the cluster device can be used. It will do all the needed configuration to make it usable and initialize the handle used to refer to this opened device when calling other functions. By default the cluster is powered down and cannot be used. Calling this function will power it up. At the end of the call, the cluster is ready to execute a task. The caller is blocked until the operation is finished.

Parameters:

device – A pointer to the device structure of the device to open. This structure is allocated by the called and must be kept alive until the device is closed.

Returns:

0 if the operation is successfull, -1 if there was an error.

int32_t pi_cluster_close(pi_device_t *device)

Close an opened cluster device.

This function can be called to close an opened cluster device once it is not needed anymore, in order to free all allocated resources. Once this function is called, the device is not accessible anymore and must be opened again before being used. This will power-down the cluster. The caller is blocked until the operation is finished.

Parameters:

device – A pointer to the structure describing the device.

static inline struct pi_cluster_task *pi_cluster_task(struct pi_cluster_task *task, void (*entry)(void*), void *arg)

Prepare a cluster task for execution.

This initializes a cluster task before it can be sent to the cluster side for execution. If the same task is re-used for several executions, it must be reinitialized everytime by calling this function.

Parameters:

task – A pointer to the structure describing the task. This structure is allocated by the caller and must be kept alive until the task has finished execution.
entry – The task entry point that the cluster controller will execute.
arg – The argument to the entry point.

static inline pi_cluster_task_t *pi_cluster_task_stacks(pi_cluster_task_t *task, void *stacks, int stack_size)

Specify cluster task stack information.

This can be called to configure the size of the stacks and to specify stacks allocated by the caller.

Parameters:

task – The task for which the stack information is being speficied.
stacks – Pointer to the memory which should be used for the stacks.
stack_size – Size of the memory which should be used for the stacks.

static inline void pi_cluster_task_priority(pi_cluster_task_t *task, uint8_t priority)

Specify the cluster task priority.

This sets the priority of the specified cluster task. Only priorities 0 and 1 are currently supported. The cluster driver provides on cluster side a cooperative priority-based scheduler. A currently running task is in charge of periodically checking if it must release the cluster to let a higher priority task execute.

Parameters:

task – The cluster task
priority – The cluster task priority, can be 0 or 1.

static inline int pi_cluster_send_task(struct pi_device *device, struct pi_cluster_task *task)

Enqueue a task for execution on the cluster.

This will enqueue the task at the end of the queue of tasks, ready to be executed by the specified cluster. Once the task gets scheduled, the cluster-controller core is waken-up and starts executing the task entry point. This function is intended to be used for coarse-grain job delegation to the cluster side, and thus the stack used to execute this function can be specified. When the function starts executing on the cluster, the other cores of the cluster are also available for parallel computation. Thus the stacks for the other cores (called slave cores) can also be specified, as well as the number of cores which can be used by the function on the cluster (including the cluster controller). The caller is blocked until the task has finished execution. This function only supports task with priority 0. pi_cluster_enqueue should be used instead to use priority 1.

Note that this enqueues a function execution. To allow cluster executions to be pipelined, several tasks can be enqueued at the same time. If more than two tasks are enqueued, as soon as the first is finished, the cluster-controller core immediately continues with the next one, while the fabric controller receives the termination notification and can enqueue a new execution, in order to keep the cluster busy.

Parameters:

device – A pointer to the structure describing the device.
task – Cluster task structure containing task and its parameters.

static inline int pi_cluster_send_task_async(struct pi_device *device, struct pi_cluster_task *cluster_task, pi_evt_t *task)

Enqueue asynchronously a task for execution on the cluster.

This will enqueue the task at the end of the queue of tasks, ready to be executed by the specified cluster. Once the task gets scheduled, the cluster-controller core is waken-up and starts executing the task entry point. This function is intended to be used for coarse-grain job delegation to the cluster side, and thus the stack used to execute this function can be specified. When the function starts executing on the cluster, the other cores of the cluster are also available for parallel computation. Thus the stacks for the other cores (called slave cores) can also be specified, as well as the number of cores which can be used by the function on the cluster (including the cluster controller). The task is just enqueued and the caller continues execution. A task must be specified in order to specify how the caller should be notified when the task has finished execution. This function only supports task with priority 0. pi_cluster_enqueue_task_async should be used instead to use priority 1.

Note that this enqueues a function execution. To allow cluster executions to be pipelined, several tasks can be enqueued at the same time. If more than two tasks are enqueued, as soon as the first is finished, the cluster-controller core immediately continues with the next one, while the fabric controller receives the termination notification and can enqueue a new execution, in order to keep the cluster busy.

Parameters:

device – A pointer to the structure describing the device.
cluster_task – Cluster task structure containing task and its parameters.
task – The task used to notify the end of execution.

int pi_cluster_enqueue_task(struct pi_device *device, struct pi_cluster_task *task)

Enqueue a task for execution on the cluster.

This function is similar to pi_cluster_send_task but supports priority 0 and 1 and do not support automatic stack allocation. Stacks but always be allocated by the caller. This will enqueue the task at the end of the queue of tasks, ready to be executed by the specified cluster. Once the task gets scheduled, the cluster-controller core is waken-up and starts executing the task entry point. This function is intended to be used for coarse-grain job delegation to the cluster side, and thus the stack used to execute this function can be specified. When the function starts executing on the cluster, the other cores of the cluster are also available for parallel computation. Thus the stacks for the other cores (called slave cores) can also be specified, as well as the number of cores which can be used by the function on the cluster (including the cluster controller). The caller is blocked until the task has finished execution.

Note that this enqueues a function execution. To allow cluster executions to be pipelined, several tasks can be enqueued at the same time. If more than two tasks are enqueued, as soon as the first is finished, the cluster-controller core immediately continues with the next one, while the fabric controller receives the termination notification and can enqueue a new execution, in order to keep the cluster busy.

Parameters:

device – A pointer to the structure describing the device.
task – Cluster task structure containing task and its parameters.

int pi_cluster_enqueue_task_async(struct pi_device *device, struct pi_cluster_task *cluster_task, pi_evt_t *task)

Enqueue asynchronously a task for execution on the cluster.

This function is similar to pi_cluster_send_task_async but supports priority 0 and 1 and do not support automatic stack allocation. Stacks but always be allocated by the caller. This will enqueue the task at the end of the queue of tasks, ready to be executed by the specified cluster. Once the task gets scheduled, the cluster-controller core is waken-up and starts executing the task entry point. This function is intended to be used for coarse-grain job delegation to the cluster side, and thus the stack used to execute this function can be specified. When the function starts executing on the cluster, the other cores of the cluster are also available for parallel computation. Thus the stacks for the other cores (called slave cores) can also be specified, as well as the number of cores which can be used by the function on the cluster (including the cluster controller). The task is just enqueued and the caller continues execution. A task must be specified in order to specify how the caller should be notified when the task has finished execution.

Note that this enqueues a function execution. To allow cluster executions to be pipelined, several tasks can be enqueued at the same time. If more than two tasks are enqueued, as soon as the first is finished, the cluster-controller core immediately continues with the next one, while the fabric controller receives the termination notification and can enqueue a new execution, in order to keep the cluster busy.

Parameters:

device – A pointer to the structure describing the device.
cluster_task – Cluster task structure containing task and its parameters.
task – The task used to notify the end of execution.

static inline int pi_cl_task_yield()

Check if the current task should release the cluster.

Since there is no preemption on cluster side, a low priority running task must periodically check if the cluster should be released by calling this function. If so the task should return, and the entry point will be called again later on to resume execution. Note that if this funtion tells that the task should return, this will force the entry point to be called again later on.

Returns:: 1 if the current task should release the cluster or 0 if it can keep executing.

void *pi_cl_l1_scratch_alloc(pi_device_t *device, pi_cluster_task_t *task, int size)

Allocate L1 memory from the task scratch area.

The scratch area is an area within the L1 memory reserved during cluster configuration for data which does not need to be kept when a cluster task ends or is suspended to let another higher priority task execute. Each task is having its own linear scratch allocator so that such memory area can be reused from one task to another. Calling this function will allocate scratch data from the specified task in a linear way, which means the allocated data must be freed in reverse order. Allocated data do not need to be freed when the task is over since it is just a pointer being increased or decreased. This allocator is reset every time the task is reset, for example by calling pi_cluster_task. The amount of scratch data allocated for the specified task can not exceed the size of the scratch area specified when the cluster was opened (0 by default).

Parameters:

device – The cluster device where the data is being allocated.
task – The cluster task on which the scratch data must be allocated.
size – Size of the allocated data.

Returns:

pointer The allocated area if the allocation succeeded.

Returns:

NULL In case the allocation failed.

void pi_cl_l1_scratch_free(pi_device_t *device, pi_cluster_task_t *task, int size)

Free L1 memory from the task scratch area.

Calling this function will free the specified amount of data in the scrath allocator. No data needs to be provided since the allocator works linearly.

Parameters:

device – The cluster device where the data is being freed.
task – The cluster task on which the scratch data must be freed.
size – Size of the freed data.

Typedefs

typedef struct pi_cluster_conf pi_cluster_conf_t

Enums

enum pi_cluster_flags_e

Cluster configuration flags.

Values:

enumerator PI_CLUSTER_FLAGS_FORK_BASED = (0 << 0): Start the cluster with a fork-based execution model.

enumerator PI_CLUSTER_FLAGS_TASK_BASED = (1 << 0): Start the cluster with a task-based execution model.

Functions

void pi_cl_send_task_to_fc(pi_evt_t *task)

Enqueue a task to fabric-controller side.

This enqueues the specified task into the fabric-controller task scheduler for execution. The task must have been initialized from fabric-controller side.

Parameters:

task – Pointer to the fabric-controller task to be enqueued.

static inline void pi_cl_send_callback_to_fc(pi_callback_t *callback)

Send a callback to Fabric Controller.

This function is used to send a simple callback to FC.

Note

This is an alternative to pi_cl_send_task_to_fc().

Parameters:

callback – Pointer to callback(with function and arg).

Variables

pi_device_api_t cluster_api

struct pi_cluster_conf

#include <cl_pmsis_types.h>

Cluster configuration structure.

This structure is used to pass the desired cluster configuration to the runtime when opening a cluster.

Public Members

int id: Cluster ID, starting from 0.

uint32_t cc_stack_size: Cluster controller stack size (0x800)

uint32_t scratch_size: Size of the L1 reserved for scratch data.

pi_cluster_flags_e flags: Additional flags.

uint32_t icache_conf: Reserved for cluster icache configuration: b’0: icache enable b’1: master core icache enable b’2-10: prefetch enable, bit i => core[i]

Cluster team synchronization

Warning

doxygengroup: Cannot find group “ClusterTeam” in doxygen xml output for project “gap_sdk” from directory: _build/xml_gap_sdk

Cluster DMA

group ClusterDMA

This set of functions provides support for controlling the cluster DMA. The cluster has its own local memory for fast access from the cluster cores while the other memories are relatively slow if accessed by the cluster. To keep all the cores available for computation each cluster contains a DMA unit whose role is to asynchronously transfer data between a remote memory and the cluster memory.

The DMA is using HW counters to track the termination of transfers. Each transfer is by default allocating one counter, which is freed when the wait function is called and returns, which is limiting the maximum number of transfers which can be done at the same time. You can check the chip-specific section to know the number of HW counters.

Typedefs

typedef struct pi_cl_dma_cmd_s pi_cl_dma_cmd_t

Structure for DMA commands.

This structure is used by the runtime to manage a DMA command. It must be instantiated once for each copy and must be kept alive until the copy is finished. It can be instantiated as a normal variable, for example as a global variable, a local one on the stack, or through the memory allocator.

typedef struct pi_cl_dma_copy_s pi_cl_dma_copy_t

Structure for 1D DMA copy structure.

This structure is used by the runtime to manage a 1D DMA copy. It must be instantiated once for each copy and must be kept alive until the copy is finished. It can be instantiated as a normal variable, for example as a global variable, a local one on the stack, or through the memory allocator.

typedef pi_cl_dma_copy_t pi_cl_dma_copy_2d_t

Structure for 2D DMA copy structure.

This structure is used by the runtime to manage a 2D DMA copy. It must be instantiated once for each copy and must be kept alive until the copy is finished. It can be instantiated as a normal variable, for example as a global variable, a local one on the stack, or through the memory allocator.

Enums

enum pi_cl_dma_dir_e

DMA transfer direction.

Describes the direction for a DMA transfer.

Values:

enumerator PI_CL_DMA_DIR_LOC2EXT = 0: Transfer from cluster memory to external memory.

enumerator PI_CL_DMA_DIR_EXT2LOC = 1: Transfer from external memory to cluster memory.

Functions

static inline void pi_cl_dma_cmd(uint32_t ext, uint32_t loc, uint32_t size, pi_cl_dma_dir_e dir, pi_cl_dma_cmd_t *cmd)

1D DMA memory transfer.

This enqueues a 1D DMA memory transfer (i.e. classic memory copy) with simple completion based on transfer identifier.

Parameters:

ext – Address in the external memory where to access the data.
loc – Address in the cluster memory where to access the data.
size – Number of bytes to be transferred.
dir – Direction of the transfer. If it is PI_CL_DMA_DIR_EXT2LOC, the transfer is loading data from external memory and storing to cluster memory. If it is PI_CL_DMA_DIR_LOC2EXT, it is the opposite.
cmd – A pointer to the structure for the copy. This can be used with pi_cl_dma_wait to wait for the completion of this transfer.

static inline void pi_cl_dma_cmd_2d(uint32_t ext, uint32_t loc, uint32_t size, uint32_t stride, uint32_t length, pi_cl_dma_dir_e dir, pi_cl_dma_cmd_t *cmd)

2D DMA memory transfer.

This enqueues a 2D DMA memory transfer (rectangle) with simple completion based on transfer identifier.

Parameters:

ext – Address in the external memory where to access the data.
loc – Address in the cluster memory where to access the data.
size – Number of bytes to be transferred.
stride – 2D stride, which is the number of bytes which are added to the beginning of the current line to switch to the next one.
length – 2D length, which is the number of transferred bytes after which the DMA will switch to the next line.
dir – Direction of the transfer. If it is PI_CL_DMA_DIR_EXT2LOC, the transfer is loading data from external memory and storing to cluster memory. If it is PI_CL_DMA_DIR_LOC2EXT, it is the opposite.
cmd – A pointer to the structure for the copy. This can be used with pi_cl_dma_wait to wait for the completion of this transfer.

static inline void pi_cl_dma_cmd_wait(pi_cl_dma_cmd_t *cmd)

Simple DMA transfer completion wait.

This blocks the core until the specified transfer is finished. The transfer must be described trough the identifier given to the copy function.

Parameters:

cmd – The copy structure (1d or 2d).

static inline void pi_cl_dma_flush()

Simple DMA transfer completion flush.

This blocks the core until the DMA does not have any pending transfer.

static inline void pi_cl_dma_memcpy(pi_cl_dma_copy_t *copy)

1D DMA memory transfer.

This enqueues a 1D DMA memory transfer (i.e. classic memory copy) with simple completion based on transfer identifier.

Parameters:

copy – A pointer to the structure describing the transfer. The same structure can be used with pi_cl_dma_wait to wait for the completion of this transfer.

static inline void pi_cl_dma_memcpy_2d(pi_cl_dma_copy_2d_t *copy)

2D DMA memory transfer.

This enqueues a 2D DMA memory transfer (rectangle area) with simple completion based on transfer identifier.

Parameters:

copy – A pointer to the structure describing the transfer. The same structure can be used with pi_cl_dma_wait to wait for the completion of this transfer.

static inline void pi_cl_dma_wait(void *copy)

Simple DMA transfer completion wait.

This blocks the core until the specified transfer is finished. The transfer must be described trough the identifier given to the copy function.

Parameters:

copy – The copy structure (1d or 2d).