What is bare metal?
Bare metal means running software directly on hardware, instead of running it on top of an Operating System (OS).
And why would you do that? Because sometimes OSes take way too much resources (RAM + Flash), especially when the target device is a low cost microcontroller (uC). Or maybe the application is not very complex and an OS is overkill. Also many OSes are created to run on targets that have a Memory Management Unit (MMU), and your target device might not have one.
Now, using an OS is not a bad thing, quite the opposite it has many advantages. It's highly probable that you are viewing this blog using a device that's running an OS (PC, laptop, smartphone, tablet, etc). And thanks to the OS you can do multitasking in that device, also you can easily add, remove or run software on that device. It's also (almost?) impossible to break or hang the device while developing/running applications, because the OS won't grant other applications access to core/low-level areas of the device.
If you have worked on 8-bit uC from Microchip (a PIC16F877A was my first contact with the world of embedded systems) or AVR (e.g. Arduino) then you have already done bare metal development and have (hopefully) enjoyed the process of controlling at 100% the hardware of a microcontroller.
The ARM world
Today, we have access to more powerful uCs like the ARM-based microcontrollers. Many companies have developed microcontrollers based on the ARM Cortex-M architecture, so there are hundreds of differents uCs, some with more peripherals, others are lower power, but all of them are unified by the ARM architecture, which simplifies the development on these devices.
Specifically, I'm working with the STM32F407VE uC, based on the Cortex-M4 architecture, that has an integrated FPU, which makes operating floating point numbers a snap. Trying to do a multiplication of floats in a 8-bit uC takes way too much Flash and runs way slower. Forget about doing many trigonometric operations or multiplying matrices.
Around 550 bytes needed to mutiply two floats in a 8-bit uC.
Bare metal development on ARM
If you have used IDEs like MPLABX, AVRStudio, etc. Then you have skipped the (fun) part of setting up the memory distribution, and went straight to code up your application in the main function, after that you clicked the build button to create a hex file or similar, which later you flashed onto your uC to blink a LED, move a motor, etc.
Before jumping into the development, I want you to understand some basics. First, the ARM processor only knows binary (1011...), and in this case that binary is stored in an ELF file. To build this binary from source code (C/C++), which is more human readable, we need a toolchain and a linker script.
The toolchain, contains tools like the compiler and the linker to build the binary; and also provides a debugger, which allow us to run the program in the uC step by step, while observing the inner variables and registers of the uC.
The linker script is basically a map that tells the toolchain how to distribute the memory of the uC, and you can take a glimpse of it's implementation in the following image.
A small part of a linker script.
Obviously, we also need some hardware to flash the ELF file into the uC, and to debug the program in the uC. This hardware needs some connection to the uC, in the case of the ARM microcontrollers this can be via a JTAG or a SWD header. This hardware can be parallel port based or USB based, the latter is far more popular and is sometimes called JTAG dongle.
For example, I started with ARM uCs using a Bus Blaster (JTAG dongle) and a STM32F4Discovery, today I use a custom board for the F4 and a custom JTAG adapter for my projects. (you can see a picture of my hardware, in this post about Kicad)
I have created an Eclipse project template named bareCortexM that wraps all the needed tools, and the source code is available in this repository, go there for hands on experience in bare metal development. The project makes use of the following tools:
+ Eclipse, the IDE that wraps all the tools.
+ GNU Tools for ARM Embedded Processors, the toolchain.
+ openOCD, the glue between the JTAG dongle and the host PC.
On the following image you can see the full setup.
Debugging a STM32F407VE uC using bareCortexM.
Notice that a multiplication of floats takes only 5 instructions, 20 bytes, in constrast with the 500+ bytes of the multiplication algorithm of the 8-bit uC.
Now that you know how to do bare metal development, you can go ahead and build your own OS or grab a peripheral library and start developing projects. I'll release a template peripheral library for the STM32 microcontrollers later this week. Check my post about libstm32pp, a template peripheral library.
Hands on experience
If you have a STM32 microcontroller + JTAG dongle or a development board like the STM32F4DISCOVERY, then go to this post for a detailed tutorial on how to set up the bareCortexM environment and the libstm32pp peripheral library and start developing on ARM right away.