Marthe Lübbers' Personal website

The most consequential quiet revolution is privacy-preserving technology

How federated learning is making the tradeoff between privacy and progress obsolete.

The distinguishing factor for the biggest AI breakthroughs so far have not been the models themselves, but the amount of data they were trained on. A well-known example from 2013 is ImageNet, the huge dataset of categorised images that dramatically improved the accuracy and consequent usefulness of image-recognition models that evolved due to it. The same was true for large language models. OpenAI’s ChatGPT was trained on hundreds of billions of words, 60–80% of the data coming from the Common Crawl dataset. Until now, the most disruptive models were the ones with access to the most amount of data. What do we do with this insight? We should be thinking about applications of AI that are most beneficial to society and curate databases in those areas.
‍

Mass unemployment, information control, and potential new bioweapons aside (for now), utilising AI for scientific breakthroughs will yield immense benefits and advance society at an unprecedented rate. Reducing CO₂ emissions or early detection of, and innovative treatments for diseases, stands to eliminate an enormous amount of suffering in this world. Deep-learning algorithms can already detect lung or breast cancer better than radiologists and years before regular screening would pick up cancerous cells. New drugs developed by AI to treat rare diseases are reaching late-stage clinical trials. Multimodal models for medical imaging provide reports of chest x-rays and detect tumours with 15% more accuracy than previous models could. However, there is one constraint on large-scale medical breakthroughs that is significantly reducing the rate of discovery: data privacy.
‍

Medical data is protected by laws like HIPAA (US), GDPR (Europe), and similar frameworks globally, for good reason. Researchers can’t just scrape patient records the way you might scrape the web. Access requires consent, institutional review, and often years of bureaucracy. Data privacy laws are non-negotiable in safeguarding individual rights, but they unnecessarily hinder life-saving medical research. Is it possible to scale medical breakthroughs through AI without breaching data-preserving laws? I stumbled upon the answer in November 2025 and it left me in awe.
‍

As I was researching privacy-preserving architecture for my side-project FlowBud, I read about federated learning and discussed it with my former professor who pointed me towards the non-profit organisation OpenMined. Federated learning is a machine learning approach where models are trained across multiple devices or servers holding local data, so the data stays decentralised while only model updates are shared and aggregated. OpenMined built the technical infrastructure to apply federated learning in real-world, privacy-sensitive domains such as healthcare, finance, and mobile applications. Here’s how:

The Foundation: PySyft
‍PySyft is the core library that decouples private data from model training, using Federated Learning, Differential Privacy, and Encrypted Computation techniques like Multi-Party Computation and Homomorphic Encryption, working within mainstream frameworks like PyTorch and TensorFlow. PySyft lets you write software that computes over data you do not own, on machines you do not control. The sensitive data never moves.

‍SyftBox — The Protocol Layer
‍SyftBox is an open-source protocol that enables developers and organizations to build, deploy, and federate privacy-preserving computations seamlessly across a network, running computations on distributed datasets without ever centralizing the data. It turns isolated local folders (called “datasites”) into nodes in a secure, distributed file and compute network. Data stays local but can still participate in global-scale AI. Governance is programmable and enforced by protocol, not trust.

‍Syft Hub
‍Syft Hub is essentially the discovery and marketplace layer sitting on top of SyftBox. Users can download the SyftBox client, connect to the decentralized network, explore it to find datasets and privacy-preserving APIs offered by various data owners, and ask a data owner to run any API on their data. Once approved, they can perform computations and extract insights without compromising privacy.

syft-flwr — Federated Learning at Scale
‍syft-flwr is the integration between SyftBox and Flower (flwr), one of the most popular federated learning frameworks. It allows multiple organizations — say, ten hospitals across five countries — to jointly train a single AI model without any of them ever sharing their raw patient data. Each hospital trains locally, shares only model weight updates (not data), and those updates are aggregated into a global model. This is the architecture that makes cross-border medical AI feasible under GDPR and other data-protection laws.

‍Attribution-Based Control — The Big Picture
‍Syft introduces Attribution-Based Control as a new paradigm: data owners can decide which AI predictions are informed by their data while preserving attribution and the value of their contribution. AI developers can access high-quality, licensed content legally and transparently. End users can choose which trusted sources power their answers and understand where the information came from. Every interaction is logged and attributed, while data owners earn royalties automatically. When you control your data instead of surrendering it to AI intermediaries through scraping or copying, you decide every use and know when it’s used. This solves AI’s core architectural flaw: the lack of traceable data usage, which currently precludes attribution and measurable compensation.

The story of AI’s greatest breakthroughs has always been a story about data at scale. But applied to its most important frontier, human health, we slam into a wall: the data that could save millions of lives is locked behind consent frameworks, institutional review boards, and privacy laws that exist for very good reason. Federated learning is the framework to overcome what felt like an unsolvable contradiction: either you protect people’s privacy or you unlock medical breakthroughs at scale. The non-profit organisation OpenMined provides the infrastructure for scientists to run models on sensitive data, data owners to retain control over their contributions, or even competing companies to train jointly without sacrificing user data. Mass adoption of privacy-preserving technologies through federated learning may be one of the most consequential quiet revolutions of our time because it lets the most effective technologies access the data that matters most for societal progress.

Marthe Lübbers is secretly a geek.

The most consequential quiet revolution is privacy-preserving technology

How federated learning is making the tradeoff between privacy and progress obsolete.