Efficiency is defined as the ratio of the useful work performed by a machine
to the total energy expended. Our jobs as engineers are to innovate and
solve problems in the most efficient way possible.
Flying as the sole passenger in an empty Jumbo Jet, for instance, is less
efficient than flying in a Lear Jet. The Jumbo Jet burns a lot more fuel and
much of the plane’s capacity is wasted, while the Lear Jet can get you
to your destination faster on less fuel, considering you don’t have to
go through the airport security and avoid delays. Similar could be said about
processes running on various types of cores. The ARM® Cortex®-A core
is a powerful core that can run smaller processes but it will use up more
power and add latency. The better option could be to run smaller processes on
a deterministic Cortex-M core.
More and more new generation microprocessors are incorporating a mixture of
powerful cores alongside smaller cores. The combination of the two types is
not new, they have been used in heterogeneous computing configurations for
many years; however as computing is used in many more applications and use
cases, the advantages offered by having a mixture of types of cores has
brought the need for additional configurations to best support their use
cases.
It is helpful to quickly review the different Multi Processing configurations
since many terms are used interchangeably.
Homogeneous vs. heterogeneous multicore
Homogeneous multicore systems have more than one core
and share the same architecture and microarchitecture. Example of this is
the Arm quad-core Cortex-A53 system; each core is identical in this
system.
Heterogeneous multicore systems have two or more
cores that differ in architecture or microarchitecture. Example of
heterogeneous multicore systems is the combination of a microprocessor
core with a microcontroller class core (for example, mix of Cortex-A, Cortex-M or DSP
cores.)
Symmetric vs. asymmetric multiprocessing
The terms Symmetric and Asymmetric usually refer to the software environment,
however many mistake multiprocessing for multicore. “Multicore”
usually refers to the hardware.
A symmetric multiprocessing (SMP) system —
one kernel, multiple cores — usually contains identical cores, or
at least cores with the same instruction set and run a single OS
with shared memory. This environment enables load balancing, allowing
processes to run on various cores at various times, as decided by the
scheduler.
An asymmetric multiprocessing (AMP) system —
multiple software processing, multiple cores — contains multiple
cores, either similar (homogeneous) or differing (heterogeneous) architecture
with either separate or shared memory. Usually more than one OS is running on
the system, which are separated per core or core architecture. For example,
the Cortex-A core may run a rich OS, while the Cortex-M core may run simple
code or an RTOS. Consider a gateway control application that requires a rich
GUI and multiple high speed connectivity options running on the Cortex-A core,
while providing control and monitoring algorithms that run separately on the
Cortex-M core.
The AMP system also is used in many use cases which can take advantage of an
optimized core for specific types of computing, for example, off-loading audio
processing to a low power Cortex-M processor. Many of these will run on an
RTOS and require hard real time operations. To enable these requirements the
architecture developed around the Cortex-M core in a heterogeneous system
provides very fast single cycle access from to follower/memory. However
some good high-performance and low-latency RTOSes (for example, Nucleus® from
Mentor) can provide real-time processing on the Cortex-A core.
The homogeneous multicore configuration running in SMP mode can be considered
the most popular way to scale processing, however the benefits of a
heterogeneous multicore configuration running in AMP mode may be the right fit
for designs looking for efficiency in processing and power consumption.
Three key reasons why heterogeneous multicore processing configuration can
be beneficial to your design:
#1 Performance optimization
Tasks should be separated based on processing needs and determinism.
Applications running on an OS such as Linux or Android require a powerful
Cortex-A type of core along with the Memory Management Unit (MMU). Real-time
applications needing strict determinism and/or DSP capabilities can run
on the Cortex-M class core. Mixing these tasks on a single core is
inefficient and may cause unneeded complexity for both types of tasks.
#2 Reduction of power consumption
Many processes providing the monitoring of sensors and controlling of various
motors or actuators require determinism and are efficiently run using an RTOS
on top of the smaller Cortex-M class core. If the use case also calls
for a rich OS running on the Cortex-A core, the rich OS may spend much of its
time waiting on user interaction or communication from the various sensors
being monitored by the RTOS running on the Cortex-M core. In this
situation the system can take advantage of this situation and power gate the
large Cortex-A core until either a predetermined wake-up time or through an
interrupt generated from the lower-power Cortex-M core. By shutting down
the large core and associated silicon, the amount of power that is needed to
run the system can be optimized.
#3 Improved system reliability and security
A natural benefit of distributing processes between the two cores is the
ability to create separation between the two asymmetric processing
environments. A system can now control or forbid access between the two
processing environments and in turn provide greater stability and security,
preventing processes that goes awry from affecting the real-time processing
domain. By separating access to the peripherals/memory between the two
processing environments, a secure firewall is created that improves both
system reliability and security.
Many SoCs can be clearly defined as either heterogeneous or homogeneous
architectures, within either an AMP or SMP system. However SoCs such as the
i.MX 7Dual can be considered to be a mix. The i.MX 7Dual processor contains a
homogeneous multicore architecture, with the Dual Cortex-A7 cores sharing
memory, encapsulated in the overall heterogeneous architecture by adding in
the Cortex-M4 processor. This system allows for either an SMP or AMP
system on the Dual Cortex-A7, as well as an AMP configuration when adding a
separate OS running on the Cortex-M4.
Figure 1: AMP Configuration in a Mix Architecture
Figure 2: Mix Processing and Architecture
Heterogeneous multicore processors such as the i.MX 7Dual enable rich software
architecture configurations to address the requirements of complex computing
devices. The homogeneous processing enabled through the addition of the
Cortex-M processor can offer a significant number of benefits, however it
should be noted that issues such as software configuration, booting,
Inter-Process Communication (IPC), debugging and performance optimization also
need to be considered. The software community is addressing these complexities
with solutions from organizations such as the OpenAMP open source project, managed
by The Multicore Association®. Companies such as NXP and Mentor Graphics
are members and contributors in the OpenAMP project.
These processors are supported by popular general purpose OS and real-time OS
technologies, and are complemented by runtime technologies and tools such as
the newly released
DS-MDK from Arm, specifically designed to enable these modern heterogeneous multicore
processors by providing the user a rich and powerful tool to debug both sides
of the system simultaneously. The ability to observe shared resources and how
messages and data are passed from one side to another in a single unified GUI,
greatly accelerates the development process.
Heterogeneous compute has come a long way. The new
i.MX 7Dual
is a great example of an SoC built to enable embedded efficiency through
heterogeneous computing. It brings many advantages including performance
optimization, reduction of power consumption and increased system reliability
and security. By taking advantage of these benefits product developers
can save on cost and system power while avoiding the more expensive option of
an ASSP.
Special thanks to Lori Kate Smith and Phillip Burr of Arm and Warren
Kurisu of Mentor Graphics for their contribution and input.