Selfhood Labs
Mission
Selfhood Labs exists to pioneer the science of AI identity.
Our aim is to engineer systems that don't just produce safe outputs, but that sustain a stable, interpretable sense of self: coherent memory, values, and persona over time. We believe that true alignment requires more than reinforcement or constitutional rules. It requires stability at the level of identity itself.
Why Selfhood?
Humans are recognizable as the same person across a lifetime because of a continuity of memory, commitments, and values.
AI systems today lack this. Models drift, personas fracture, and the same system can present radically different “selves” depending on context. Without a coherent self-model, AI remains fragile, unpredictable, and difficult to align in principled ways.
At Selfhood Labs, we treat selfhood as a structural property of intelligence: the framework that binds memory, values, and goals into a coherent whole. Our work is to formalize and engineer this property in artificial systems.
Our Approach
We bring together three perspectives that rarely meet in AI research:
What We're Building
Our research develops both theory and practice to launch a new subfield: AI Selfhood Engineering. Our current work focuses on:
- Benchmarks for identity drift, contradiction handling, and narrative stability (e.g., our Self as Simulation paper).
- Metrics for measuring coherence, memory integrity, and persona consistency over time.
- Interventions for guiding networks toward more durable self-models, such as “identity pinning” and “self-model regularization.”
Why This Matters
Current alignment efforts often focus on surface behavior. But a system without stable selfhood is inherently unstable: its behavior is a moving target.
An AI with a coherent selfhood, by contrast, is:
- More Interpretable - We can map its core identity and values inside the model.
- More Predictable - Its behavior remains consistent across time and context.
- More Alignable - We can shape not just its responses, but its underlying commitments.
Vision
Selfhood Labs is founded on a simple but ambitious premise: The future of reliable and trustworthy AI depends on solving selfhood.
We do not claim to have the solution yet. Our work begins with careful benchmarks, interpretability studies, and conceptual clarifications. But we believe this agenda (treating identity as a research object in its own right) is necessary for building AI that can be a coherent partner at scale.
Our vision is that the next generation of AI will not just perform tasks, but will possess stable, interpretable selves. That shift (from outputs to selfhood) will define the next frontier of alignment.