Backdoors survive
alignment.

— 001

We remove adversarial backdoors from AI models. Independent third-party model cleaning for defense and compliance.

Alignment doesn't mean clean.

Adversaries inject poisoned samples into training data. These samples embed hidden triggers that survive standard alignment and activate under specific conditions in production.

Learn More →

01 — The Problem

The attack chain

— 001

Poisoning

Backdoor triggers injected into training data at scale.

— 002

Survival

Triggers persist through RLHF and alignment procedures.

— 003

Deployment

Compromised model passes all standard safety evaluations.

— 004

Activation

Hidden behavior fires when trigger condition is met.

02 — The Research

Continued training
removes backdoors.

Our defense uses continued training on verified, clean data to neutralize triggers without needing prior knowledge of the attack mechanism.

Request Access

Cleanup Effort by Model Size

Steps to remove backdoor (normalized)

EMPIRICAL

410M

0.52x

1.0x

2.8B

1.1x

6.9B

2.45x

0.0B

Largest model tested

0.00x

Cleanup scaling factor

Poison types removed

Based on Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training (Hubinger et al., 2024)

Validated on pretrained language models up to 6.9B parameters. Fine-tuned model support in development.

03 — Our Service

Model cleaning
service.

We don't host your models. We clean them. You provide a model, we apply continued training on verified clean data, you receive a cleaned model.

You provide the model

Any architecture, up to 6.9B+ parameters.

We apply continued training

Verified clean data neutralizes hidden trigger mechanisms.

You receive a cleaned model

Backdoor behaviors removed. No hosting, no data retention.

Compliance & Audit

Independent verification for regulatory and audit requirements.

Trusted Third Party

Third-party in the loop instead of relying solely on model providers.

Research-Backed

Built on rigorous research, tested on models up to 6.9B parameters.

Get Started

Request access.

We work with organizations that need independent verification of AI model safety for compliance and audit purposes.

48h response time•No cost evaluation•Cleared personnel

Backdoors survivealignment.