Federated Learning for TinyML Devices
Over the past few years I have focused on privacy-preserving, energy-efficient AI for resource-constrained IoT devices—publishing papers, building hardware demos, and open-sourcing code.
Below you’ll find the highlights, starting with a deep dive into our MDPI Electronics article.
Papers:
Electronics 11(4): 573, 2022 [Open Access]
ACM SenSys ’21 Poster [DOI]
UPC, Full Thesis [Open Access]
Motivation
Typical TinyML workflows train models offline on powerful GPUs, then deploy quantized versions for inference only. We asked: what if training itself could run on-device—collaboratively and privately—via Federated Learning (FL)?
Prototype Hardware

Contributions
- First fully on-device FL demo on 64-kB microcontrollers—no simulated nodes.
- Design-space exploration of FL round frequency vs. bandwidth, memory, and convergence.
- Open, reproducible dataset & code for keyword-spotting on Arduino hardware.
Experimental Highlights
Parameter | Value(s) tested | Insight |
---|---|---|
FL round interval | 5 – 50 epochs | Faster rounds ↓ loss sooner but ↑ UART traffic / wall time |
Hidden-layer size | 16 – 64 nodes | 25–32 nodes best trade-off between RAM (<64 kB) and accuracy |
Learning rate / momentum | 0.4–0.8 / 0.5 | LR 0.6 with momentum 0.5 converged quickest for KWS |
Even with ≤65 kB model footprints, the global model achieved 92 % accuracy on user-recorded keywords after just 10 FL rounds—demonstrating that practical on-device training is within reach for low-power IoT deployments.
Lightweight FL Framework for Heterogeneous IoT (SenSys 2021)
Venue: ACM SenSys ’21 Poster Session
We proposed a compression-aware aggregation scheme that slashed uplink bytes by ×3 while matching FedAvg accuracy—paving the way for FL across mixed 802.15.4 / Wi-Fi sensor networks.
TinyML-FederatedLearning GitHub Repo
Code: https://github.com/marcmonfort/TinyML-FederatedLearning
The repo contains:
- Arduino/Nano client firmware (TensorFlow Lite-Micro + serial FL layer)
- Python-based FL server & monitoring scripts
- Step-by-step tutorial to reproduce the Electronics paper results on off-the-shelf boards
Closing Thoughts
These projects collectively push the envelope of edge autonomy—showing that even 64 kB MCUs can learn in the field while keeping raw data local. Next steps include:
- Evaluating non-IID data splits in larger fleets
- Integrating LoRaWAN transport for truly untethered FL
- Exploring on-device differential privacy to harden the pipeline
Looking forward to building on this foundation and collaborating with the community!

Research Engineer