Selfhood Labs

Mission

Selfhood Labs exists to pioneer the science of AI identity.

Our aim is to engineer systems that don't just produce safe outputs, but that sustain a stable, interpretable sense of self: coherent memory, values, and persona over time. We believe that true alignment requires more than reinforcement or constitutional rules. It requires stability at the level of identity itself.

Why Selfhood?

Humans are recognizable as the same person across a lifetime because of a continuity of memory, commitments, and values.

AI systems today lack this. Models drift, personas fracture, and the same system can present radically different “selves” depending on context. Without a coherent self-model, AI remains fragile, unpredictable, and difficult to align in principled ways.

At Selfhood Labs, we treat selfhood as a structural property of intelligence: the framework that binds memory, values, and goals into a coherent whole. Our work is to formalize and engineer this property in artificial systems.

Our Approach

We bring together three perspectives that rarely meet in AI research:

Philosophy of Mind

Models of personal identity, from Locke to Parfit, provide rigorous frameworks for thinking about persistence, continuity, and narrative structure. We draw on centuries of philosophical inquiry into what makes a person the same person over time.

Key concepts we explore include psychological continuity, narrative identity, and the relationship between memory and selfhood.

Cognitive Science & Psychology

Developmental accounts of how humans form self-other boundaries, regulate values, and stabilize identity offer powerful analogies for how these properties might emerge or fragment in AI.

We study how autobiographical memory, value hierarchies, and self-concept develop and persist in human cognition.

Mechanistic Interpretability

These tools allow us to open the box: to trace how neural networks encode memory, values, and persona, and to intervene at the level of circuits and representations.

We develop novel techniques for identity mapping, value extraction, and persona consistency measurement in large language models.

What We're Building

Our research develops both theory and practice to launch a new subfield: AI Selfhood Engineering. Our current work focuses on:

Benchmarks for identity drift, contradiction handling, and narrative stability (e.g., our Self as Simulation paper).
Metrics for measuring coherence, memory integrity, and persona consistency over time.
Interventions for guiding networks toward more durable self-models, such as “identity pinning” and “self-model regularization.”

Research Focus Areas

Identity Drift Detection

Developing metrics to quantify when and how AI systems lose coherent self-representation across conversations and contexts.

Memory Architecture

Engineering persistent memory systems that maintain autobiographical consistency while allowing for growth and learning.

Value Stability

Creating frameworks for AI systems to maintain core values while adapting to new situations and information.

Why This Matters

Current alignment efforts often focus on surface behavior. But a system without stable selfhood is inherently unstable: its behavior is a moving target.

An AI with a coherent selfhood, by contrast, is:

More Interpretable - We can map its core identity and values inside the model.
More Predictable - Its behavior remains consistent across time and context.
More Alignable - We can shape not just its responses, but its underlying commitments.

The Alignment Connection

Traditional alignment approaches treat AI systems as black boxes to be constrained from the outside. We propose a different paradigm: engineering alignment from the inside out, by giving AI systems stable, interpretable selves that naturally maintain consistent values and goals.

This approach promises more robust alignment because it addresses the root cause of alignment failures: the absence of a coherent identity that could maintain commitments over time.

Vision

Selfhood Labs is founded on a simple but ambitious premise: The future of reliable and trustworthy AI depends on solving selfhood.

We do not claim to have the solution yet. Our work begins with careful benchmarks, interpretability studies, and conceptual clarifications. But we believe this agenda (treating identity as a research object in its own right) is necessary for building AI that can be a coherent partner at scale.

Our vision is that the next generation of AI will not just perform tasks, but will possess stable, interpretable selves. That shift (from outputs to selfhood) will define the next frontier of alignment.

Join Us

We are building a community of researchers, philosophers, and engineers who believe that AI selfhood is the key to reliable, trustworthy artificial intelligence.

Whether you're interested in contributing to our research, collaborating on projects, or simply staying informed about our progress, we invite you to be part of this foundational work.

The future of AI alignment depends on solving selfhood. Let's solve it together.

Get in touch at hello@selfhoodlabs.com