Embedded software doesn't run the way most people think it does Most people skip this — try not to..
If you've only written code for web servers, mobile apps, or desktop tools, you're used to programs that start, run, and stop on command. Day to day, embedded systems? They sit in memory, waiting for a user to click something. Different beast entirely Small thing, real impact. Worth knowing..
The microcontroller in your coffee maker doesn't "wait for input" the way a browser waits for a click. So it wakes up, does a job, and goes back to sleep — sometimes thousands of times per second. Sometimes it only runs during an interrupt. Sometimes only during a specific 2-millisecond window every 100 milliseconds.
Understanding when embedded code actually executes — and why — changes how you write it, debug it, and optimize it Worth keeping that in mind..
What Is Event-Driven Embedded Execution
Most embedded software isn't a loop that runs forever. It's a collection of responses.
The CPU sits in a low-power state. Because of that, a timer expires. Plus, a pin changes state. In practice, a UART receives a byte. Consider this: an ADC conversion finishes. Day to day, Then the code runs. This is interrupt-driven, event-driven execution — and it's the default mode for anything battery-powered, real-time, or resource-constrained.
Easier said than done, but still worth knowing.
The interrupt context vs. thread context distinction
Here's where it gets messy. Code running inside an interrupt service routine (ISR) has different rules than code running in a task or main loop:
- No blocking allowed — you can't wait for a mutex, a semaphore, or a delay inside an ISR. The scheduler isn't running.
- Stack is tiny — often 128–512 bytes. Deep call chains or large local arrays will overflow it silently.
- Registers must be saved/restored — the compiler handles this, but it costs cycles. Every ISR entry/exit burns 12–50 instructions on ARM Cortex-M.
- Priority matters — a higher-priority interrupt can preempt your ISR. Nested interrupts are real, and they break assumptions about atomicity.
Code running in a thread (FreeRTOS task, Zephyr thread, bare-metal superloop) has more freedom — but it only runs when the scheduler says so. And the scheduler only runs when an interrupt hands control over.
Polling vs. interrupts: the false dichotomy
Beginners think it's either/or. It's not.
A well-designed system uses interrupts to signal events and threads to process them. So the ISR does the bare minimum: acknowledge hardware, copy data to a ring buffer, signal a semaphore. The thread wakes up, processes the buffer, updates state, maybe sends a CAN frame.
Why not do it all in the ISR? Because ISRs block everything else. A 200μs ISR on a 48 MHz Cortex-M0+ means 9,600 cycles where nothing else runs. Do that every millisecond and you've eaten 20% of your CPU just handling interrupts Worth keeping that in mind..
Why It Matters: Timing, Power, and Determinism
If you don't grasp when your code runs, you'll ship bugs that only appear under load — or on the 10,000th unit in the field.
Real-time deadlines aren't suggestions
"Embedded software usually runs only during" — during what? The brake pressure sensor isn't read. Consider this: during its allocated time window. Day to day, miss it, and the motor stalls. The BMS misses a cell overvoltage event Simple as that..
In a hard real-time system, worst-case execution time (WCET) matters more than average. Practically speaking, you need to know: what's the longest path through this ISR? Through this task? What happens when three interrupts fire simultaneously?
It's why automotive and medical firmware uses static analysis tools (Rapita, AbsInt, LDRA) to prove timing bounds. That said, not "test and hope. " *Prove.
Power budget lives in the sleep states
A BLE sensor running on a coin cell spends 99.9% of its time in EM2/EM3/STOP mode — CPU off, RAM retained, GPIO interrupts armed. It wakes for 500μs every 100ms to read a temperature sensor, stuff the value into a buffer, and go back to sleep.
If your "quick" I2C read takes 3ms because you polled the busy flag instead of using DMA + interrupt, you just 6x'd your average current draw. Battery life drops from 18 months to 3 That's the part that actually makes a difference..
The code that runs during that 500μs window? It better be tight. No printf. No malloc. No floating-point emulation unless the hardware has an FPU.
Determinism enables safety
ISO 26262 (automotive), IEC 61508 (industrial), DO-178C (avionics) — they all require evidence that your software behaves predictably. Day to day, not "usually fast. " *Always fast enough.
That means:
- No dynamic memory allocation after init
- Bounded loops with compile-time known maxima
- Interrupt latency measured and documented
- Watchdog refreshes in every execution path
You can't provide this evidence if you don't know when your code runs.
How It Works: Execution Models in Practice
Let's look at the three dominant patterns. Real systems often mix them.
1. Bare-metal superloop + interrupts
int main(void) {
hw_init();
while (1) {
if (uart_rx_ready()) process_uart();
if (adc_done()) process_adc();
if (can_tx_pending()) send_can();
// maybe sleep here if idle
}
}
When code runs:
- ISRs: immediately on hardware event (highest priority)
- Main loop functions: only when the loop reaches them
Pros: Simple, zero RTOS overhead, fully deterministic if you bound loop time.
Cons: No preemption. A long process_adc() blocks UART handling. Hard to scale.
Fix: Keep loop functions short. Defer heavy work to a background task triggered by a flag set in the ISR Worth keeping that in mind..
2. RTOS tasks + interrupt signaling
void uart_isr(void) {
BaseType_t woken = pdFALSE;
uint8_t byte = UART->DR;
xQueueSendFromISR(uart_queue, &byte, &woken);
portYIELD_FROM_ISR(woken);
}
void uart_task(void *arg) {
uint8_t byte;
while (xQueueReceive(uart_queue, &byte, portMAX_DELAY)) {
parse_protocol(byte);
### 3. Event‑driven, cooperative multitasking
Some modern frameworks (Zephyr, RIOT, Mbed OS) expose a *cooperative* scheduler where tasks voluntarily yield. They bundle the super‑loop pattern with lightweight “tasks” that can block on queues or timers without preemption.
```c
static void sensor_task(void *arg) {
while (1) {
read_temperature();
vTaskDelay(pdMS_TO_TICKS(100)); // yield to others
}
}
The advantage is that the code remains linear and readable, yet you still get the safety guarantees of bounded blocking. The cost is that a misbehaving task (forgetting to yield) can hang the whole system.
A Concrete Example: BLE Heart‑Rate Monitor
| Layer | What Happens | Timing Impact |
|---|---|---|
| Low level | MCU wakes from EM3, clocks the BLE radio, receives a packet | 3 ms packet reception |
| ISR | BLE_RX_ISR pushes packet into a circular buffer |
< 200 µs, no blocking |
| Task | parse_hrm() iterates over the buffer, updates a shared struct |
1 ms, bounded |
| Application | display_task() reads the struct, updates OLED |
5 ms, runs every 200 ms |
If the ISR were to call printf() it would block for several milliseconds, pushing the BLE radio into a hard‑to‑recover state and causing missed packets. By isolating the ISR to a single atomic action and letting the task do the heavy lifting, the system meets its 3 ms packet‑to‑display deadline while keeping the battery at 2 Ah for 12 months.
Measuring What Matters
- Worst‑case interrupt latency – use a high‑speed logic analyzer on the IRQ line and a timestamp counter.
- Task criticality – annotate functions with
__attribute__((optimize("O0")))and run a static analyzer to prove no dynamic allocation. - Power profile – record the MCU current draw at 1 kHz while cycling through all states. Verify that the average matches the theoretical value from your sleep‑state budget.
Take‑Home Checklist for Deterministic Firmware
| ✅ | Item |
|---|---|
| 1 | All ISR code is ≤ 200 µs and free of blocking calls. |
| 2 | Tasks run in bounded time; loops have compile‑time limits. |
| 3 | Memory is allocated once at start‑up; no heap after. |
| 4 | Interrupt priority levels are set so that critical events pre‑empt lower‑priority work. Even so, |
| 5 | Power‑state transitions are deterministic; you know exactly how long the CPU stays in EM2/EM3. |
| 6 | Static analysis (e.On the flip side, g. Day to day, , MISRA‑C, Coverity) flags any potential race or overflow. |
| 7 | Runtime checks (watchdog, stack‑overflow guard) are exercised in every path. |
Conclusion
Determinism is not a luxury; it is the backbone of any safety‑critical or battery‑constrained embedded system. The when of your code execution—whether it’s triggered by a hardware interrupt, a scheduled RTOS task, or a cooperative loop—directly dictates latency, power, and reliability. By constraining that timing, you can prove that your firmware will always finish its work in time, that it will never enter an unsafe state, and that the battery will last as advertised Nothing fancy..
Remember: “Fast” is only meaningful if you know when fast happens. Treat the timing of every function as a contract you must honor, and the rest of your system will follow.