Why CP and CPSs Matter More Than You Think

Leave a reply

I’ve been in the PKI space for a long time, and I’ll be honest, digging through Certificate Policies (CPs) and Certification Practice Statements (CPSs) is far from my favorite task. But as tedious as they can be, these documents serve real, high-value purposes. When you approach them thoughtfully, the time you invest is anything but wasted.

What a CPS Is For

Beyond satisfying checkbox compliance, a solid CPS should:

Build trust by showing relying parties how the CA actually operates.
Guide subscribers by spelling out exactly what is required to obtain a certificate.
Clarify formats by describing certificate profiles, CRLs, and OCSP responses so relying parties know what to expect.
Enable oversight by giving auditors, root store programs, and researchers a baseline to compare against real-world issuance.

If a CPS fails at any of these, it fails in its primary mission.

Know Your Audience

A CPS is not just for auditors. It must serve subscribers who need to understand their obligations, relying parties weighing whether to trust a certificate, and developers, security researchers, and root store operators evaluating compliance and interoperability.

The best documents speak to all of these readers in clear, plain language without burying key points under mountains of boilerplate.

A useful parallel is privacy policies or terms of service documents. Some are written like dense legal contracts, full of cross-references and jargon. Others aim for informed consent and use plain language to help readers understand what they are agreeing to. CPs and CPSs should follow that second model.

Good Examples Do Exist

If you’re looking for CPS documents that get the basics right, Google Trust Services and Fastly are two strong models:

There are many ways to evaluate a CPS, but given the goals of these documents, fundamental tests of “good” would certainly include:

Scope clarity: Is it obvious which root certificates the CPS covers?
Profile fidelity: Could a reader recreate reference certificates that match what the CA actually issues?

Most CPSs fail even these basic checks. Google and Fastly pass, and their structure makes independent validation relatively straightforward. Their documentation is not just accurate, it is structured to support validation, monitoring, and trust.

Where Reality Falls Short

Unfortunately, most CPSs today don’t meet even baseline expectations. Many lack clear scope. Many don’t describe what the issued certificates will look like in a way that can be independently verified. Some fail to align with basics like RFC 3647, the framework they are supposed to follow.

Worse still, many CPS documents fail to discuss how or if they meet requirements they claim compliance with. That includes not just root program expectations, but also standards like:

Server Certificate Baseline Requirements
S/MIME Baseline Requirements
Network and Certificate System Security Requirements

These documents may not need to replicate every technical detail, but they should objectively demonstrate awareness of and alignment with these core expectations. Without that, it’s difficult to expect trust from relying parties, browsers, or anyone else depending on the CA’s integrity.

Even more concerning, many CPS documents don’t fully reflect the requirements of the root programs that grant them inclusion:

The Cost of Getting It Wrong

These failures are not theoretical. They have led to real-world consequences.

Take Bug 1962829, for example, a recent incident involving Microsoft PKI Services. “A typo” introduced during a CPS revision misstated the presence of the keyEncipherment bit in some certificates. The error made it through publication and multiple reviews, even as millions of certificates were issued under a document that contradicted actual practice.

The result? Distrust risks, revocation discussions, and a prolonged, public investigation.

The Microsoft incident reveals a deeper problem, CAs that lack proper automation between their documented policies and actual certificate issuance. This wasn’t just a documentation error, it exposed the absence of systems that would automatically catch such discrepancies before millions of certificates were issued under incorrect policies.

This isn’t an isolated case. CP and CPS “drift” from actual practices has played a role in many other compliance failures and trust decisions. This post discusses CA distrust and misissuance due to CP or CPS not matching observable reality is certainly a common factor.

Accuracy Is Non-Negotiable

Some voices in the ecosystem now suggest that when a CPS is discovered to be wrong, the answer is simply to patch the document retroactively and move on. This confirms what I have said for ages, too many CAs want the easy way out, patching documents after problems surface rather than investing in the automation and processes needed to prevent mismatches in the first place.

That approach guts the very purpose of a CPS. Making it easier for CAs to violate their commitments creates perverse incentives to avoid investing in proper compliance infrastructure.

Accountability disappears if a CA can quietly “fix” its promises after issuance. Audits lose meaning because the baseline keeps shifting. Relying-party trust erodes the moment documentation no longer reflects observable reality.

A CPS must be written by people who understand the CA’s actual issuance flow. It must be updated in lock-step with code and operational changes. And it must be amended before new types of certificates are issued. Anything less turns it into useless marketing fluff.

Make the Document Earn Its Keep

Treat the CPS as a living contract:

Write it in plain language that every audience can parse.
Tie it directly to automated linting so profile deviations are caught before issuance. Good automation makes policy violations nearly impossible; without it, even simple typos can lead to massive compliance failures.
Publish all historical versions so the version details in the document are obvious and auditable. Better yet, maintain CPS documents in a public git repository with markdown versions that make change history transparent and machine-readable.
Run every operational change through a policy-impact checklist before it reaches production.

If you expect others to trust your certificates, your public documentation must prove you deserve that trust. Done right, a CPS is one of the strongest signals of a CA’s competence and professionalism. Done wrong, or patched after the fact, it is worse than useless.

Root programs need to spend time documenting the minimum criteria that these documents must meet. Clear, measurable standards would give CAs concrete targets and make enforcement consistent across the ecosystem. Root programs that tolerate retroactive fixes inadvertently encourage CAs to cut corners on the systems and processes that would prevent these problems entirely.

CAs, meanwhile, need to ask themselves hard questions: Can someone unfamiliar with internal operations use your CPS to accomplish the goals outlined in this post? Can they understand your certificate profiles, validation procedures, and operational commitments without insider knowledge?

More importantly, CAs must design their processes around ensuring these documents are always accurate and up to date. This means implementing testing to verify that documentation actually matches reality, not just hoping it does.

The Bottom Line

CPS documents matter far more than most people think. They are not busywork. They are the public guarantee that a CA knows what it is doing and is willing to stand behind it, in advance, in writing, and in full view of the ecosystem.

Déjà Vu in the WebPKI

Leave a reply

This morning, the Chrome Root Program dropped another announcement about Certificate Authority (CA) performance. Starting with Chrome 139, new TLS server certificates from specific Chunghwa Telecom [TAIWAN] and NetLock Kft. [HUNGARY] roots issued after July 31, 2025 will face default distrust. Why? “Patterns of concerning behavior observed over the past year” that have “diminished” Chrome’s confidence, signaling a “loss of integrity.”

For those of us in the WebPKI ecosystem, this news feels less like a shock and more like a weary nod of recognition. It’s another chapter in the ongoing saga of trust, accountability, and the recurring failure of some CAs to internalize a fundamental principle: “If you’re doing it right, you make the web safer and provide more value than the risk you represent.” Chrome clearly believes these CAs are falling short on that value proposition.

Browsers don’t take these actions lightly, their role as guardians of user trust necessitates them. They delegate significant trust to CAs, and when that trust gets undermined, the browser’s own credibility suffers. As Chrome’s policy states, and today’s announcement reinforces, CAs must “provide value to Chrome end users that exceeds the risk of their continued inclusion.” This isn’t just boilerplate; it’s the yardstick.

Incident reports and ongoing monitoring provide what little visibility exists into the operational realities of the numerous CAs our ecosystem relies upon. When that visibility reveals “patterns of concerning behavior,” the calculus of trust shifts. Root program managers scrutinize incident reports to assess CAs’ compliance, security practices, and, crucially, their commitment to actual improvement.

“Patterns of Concerning Behavior” Means Systemic Failure

The phrase “patterns of concerning behavior” is diplomatic speak. What it actually means is a CA’s repeated demonstration of inability, or unwillingness, to adhere to established, non-negotiable operational and security standards. It’s rarely a single isolated incident that triggers such action. More often, it’s the drip-drip-drip of failures, suggesting deeper systemic issues.

These patterns typically emerge from three critical failures:

Failing to identify true root causes. Many CAs identify superficial causes like “we missed this in our review,” “compliance failed to detect,” “we had a bug” without rigorously asking why these occurred and what foundational changes are necessary. This inevitably leads to repeat offenses.
Failure to learn from past incidents. The WebPKI has a long memory, and public incident reports are meant to be learning opportunities for the entire ecosystem. When a CA repeats its own mistakes, or those of others, it signals a fundamental breakdown in their improvement processes.
Failure to deliver on commitments. Perhaps the most egregious signal is when a CA makes commitments to address issues (engineering changes, operational improvements) and then simply fails to deliver. This reflects disrespect for root programs and the trust placed in CAs, while signaling weak compliance and engineering practices.

Chrome’s expectation for “meaningful and demonstrable change resulting in evidenced continuous improvement” wasn’t met. This isn’t about perfection; it’s about demonstrable commitment to improvement and proving it works. A “loss of integrity,” as Chrome puts it, is what happens when that commitment is found wanting.

The Problem with “Good Enough” Incident Response

Effective incident reporting should be boring, routine, and a clear demonstration that continued trust is justified. But for CAs exhibiting these negative patterns, their incident responses are anything but. They become exercises in damage control, often revealing unpreparedness, insufficient communication, or reluctance to fully acknowledge the scope and true cause of their failings.

The dangerous misconception that incident reporting is merely a “compliance function” undermines the entire process. Effective incident response requires concerted effort from compliance, engineering, operations, product teams, and leadership. When this holistic approach is missing, problematic “patterns” are inevitable.

Root programs consistently see through common deflections and mistakes that CAs make when under scrutiny:

Arguing that rules should change during an incident, even though CAs agreed to the requirements when they joined the ecosystem
Claiming an issue is “non-security relevant” as an excuse, even though requirements are requirements. There’s no “unless it isn’t a security issue” exception
Asking root programs for permission to fail despite the fact that lowering standards for one CA jeopardizes the entire WebPKI
Not following standard reporting templates signals that you don’t know the requirements and externalizes the costs of that on others by making analysis unnecessarily difficult

Accountability Isn’t Optional

Chrome’s recent actions represent accountability in practice. While some might view this as punitive, it’s a necessary mechanism to protect WebPKI integrity. For the CAs in question, and all others, the message is clear:

Rely on tools and data, not just people. Use automated systems and data-driven strategies to ensure standardized, reliable incident responses.

Preparation isn’t optional. Predefined response strategies, validated through tabletop exercises, are crucial infrastructure.

Transparency isn’t a buzzword. It’s a foundational requirement for building and maintaining trust, especially when things go wrong.

This isn’t about achieving impossible perfection. It’s about establishing and maintaining robust, auditable, and consistently improving systems and processes. It’s about fostering organizational culture where “the greatest enemy of knowledge is not ignorance, but the illusion of knowledge,” and where commitment to “sweat in practice to bleed less in battle” shows up in every action.

Trust Is Earned, Not Given

The WebPKI is built on a chain of trust. When links in that chain demonstrate repeated weakness and failure to strengthen themselves despite guidance and opportunity, the only responsible action is to isolate that risk.

Today’s announcement is simply that principle in action, a reminder that in the WebPKI, trust is earned through consistent excellence and lost through patterns of failure. The choice, as always, remains with each CA: demonstrate the value that exceeds your risk, or face the consequences of falling short.

Necessity is the Mother of Invention: Why Constraints Invite Innovation

Leave a reply

Limitations often spark the most creative solutions in technology. Whether it’s budget constraints, legal hurdles, or hardware restrictions, these boundaries don’t just challenge innovation, they fuel it.

This principle first clicked for me as a broke kid who desperately wanted to play video games, but I did have access to BBSs, a computer, and boundless curiosity. These bulletin-board systems hosted chat rooms where people collaborated to crack games. To access premium games, you needed to contribute something valuable. This necessity sparked my journey into software cracking.

Without prior expertise, I cycled to the local library, borrowed a book on assembly language, and began methodically reverse-engineering my favorite game’s copy protection. After numerous failed attempts, I discovered the developers had intentionally damaged specific floppy-disk sectors with a fine needle during manufacturing. The software verified these damaged sectors at runtime, refusing to operate without detecting these deliberate defects. Through persistent experimentation and countless hours of “NOP-ing” suspicious assembly instructions, I eventually bypassed the DRM. This experience vividly demonstrated how necessity, persistence, and precise technical exploration drive powerful innovation.

This principle consistently emerges across technology: constraints aren’t merely obstacles, they’re catalysts for creative solutions. The stories that follow, spanning console gaming, handheld computing, national semiconductor strategy, and modern AI research, illustrate how limits of every kind spark breakthrough thinking.

Nintendo: Legal Ingenuity Through Simplicity

In the late 1980s, Nintendo faced rampant cartridge piracy. Rather than implementing complex technical protections that pirates could easily circumvent, Nintendo embedded a simple copyrighted logo into their cartridge ROMs. Games wouldn’t run unless the boot sequence found an exact match. This elegant approach leveraged copyright law, transforming minimal technical effort into robust legal protection.

Palm OS: Creativity Driven by Extreme Limitations

Early Palm devices offered just 128 KB to 1 MB of memory, forcing developers into remarkable efficiency. Every feature required thorough justification. As a result, Palm OS applications became celebrated for their simplicity, responsiveness, and intuitive user experience. Users valued these apps precisely because constraints compelled developers to distill functionality to its essential elements.

China’s Semiconductor Innovation Under Sanctions

When international sanctions limited China’s access to advanced semiconductor technology, progress accelerated rather than stalled. Chinese companies turned to multi-patterning, chiplet packaging, and resilient local supply chains. Constraints became catalysts for significant breakthroughs instead of barriers to progress.

DeepSeek: Innovating Around GPU Limitations

DeepSeek faced limited access to the latest GPUs required for training large AI models. Instead of being hindered, the team embraced resource-efficient methods such as optimized pre-training and meticulously curated datasets. These strategic approaches allowed them to compete effectively with rivals possessing far greater computational resources, proving once again that constraints fuel innovation more than they impede it.

Constraints as Catalysts for Innovation

Across these diverse stories, constraints clarify objectives and inspire resourcefulness. Limits narrow the scope of possibilities, compelling individuals and teams to identify their most critical goals. They block conventional solutions, forcing innovative thinking and creative problem-solving. Ultimately, constraints channel energy and resources into the most impactful paths forward.

Turn Limits into Tools

The next time you face constraints, embrace them, and if you need to spark fresh ideas, consider deliberately creating limitations. Time-box a project to one week, cap the budget at $1,000, or mandate that a prototype run on a single micro-instance. Necessity doesn’t just inspire invention; it creates the exact conditions where meaningful innovation thrives.

What constraint will you impose on your next project?

Rethinking Compliance: AI, Skill Liquidity, and the Quest for Verifiable Truth

Leave a reply

In an earlier piece, ‘The Limitations of Audits,’ we explored how traditional compliance frameworks often fall short, functioning as point-in-time assessments rather than drivers of continuous security practices. Building on that foundation, and expanding on our exploration in ‘When AI Injects Liquidity Into Skills: What Happens to the Middle Tier?’, let’s examine how AI is poised to transform this landscape by introducing “skill liquidity” to compliance and auditing.

The High Price of Illiquid Expertise: Manual Bottlenecks in Compliance Today

As I’ve lamented before, the real cost of traditional, “illiquid” approaches to compliance expertise is staggering. In WebTrust audits, for instance, audit teams frequently report not having “enough time to look at the big picture” because their efforts are consumed by manual, repetitive tasks. Approximately 5-10% of an entire audit engagement – which can range from 350 to well over 1,500 hours for the audit firm alone – is often dedicated just to mapping organizational policy documents against standard templates. Another 15-20% of those hours are spent scrutinizing core operational processes mandated by frameworks, such as user access lifecycles or system change logs.

These percentages represent an enormous drain of highly skilled human capital on work that is largely automatable. And these figures only account for the auditors’ direct engagement. The true cost multiplies when you factor in the mountain of preparation by the entity being audited and subsequent review by third parties. The fully loaded headcount costs across this ecosystem for a single audit cycle represent a heavy tax on expertise that remains stubbornly “frozen” in manual processes.

First-Wave Automation: A Trickle of Skill Liquidity, or a New Kind of Friction?

The first wave of automation has arrived, with tools like Vanta and Secureframe offering streamlined pathways to certifications like SOC 2 by generating policy templates and automating some evidence collection. For many organizations, especially those with simpler, cloud-native environments, this has made basic compliance more accessible, a welcome “trickle of skill liquidity” that helps get a generic certification done in record time.

However, this initial wave has inadvertently created what we might call “automation asymmetry.” These tools predominantly empower the audited entity. When a company uses sophisticated automation to produce voluminous, perfectly formatted artifacts, while auditors still rely on largely manual review, a dangerous gap emerges. The truth risks getting lost in these “polished milquetoast” audits. The sheer volume and veneer of perfection can overwhelm human scrutiny, potentially masking underlying issues or a compliance posture that’s merely superficial. The audit can devolve into a review of well-presented fiction rather than an unearthing of operational fact.

Unlocking True Skill Liquidity: Intelligent Systems That Make Deep Compliance Knowledge Flow

To move beyond surface-level automation or basic Large Language Models (LLMs), we need intelligent compliance systems – sophisticated platforms designed to embed and scale deep domain knowledge. This isn’t just about processing text; it’s about an AI that understands context, relationships, history, and the intricate rules of specific compliance frameworks from the perspective of all stakeholders. Indeed, this drive to embed and scale specialized knowledge through AI is a significant trend across industries. For instance, leading professional services firms have been developing proprietary generative AI platforms, like McKinsey’s Lilli (announced in 2023), to provide their consultants with rapid access to synthesized insights drawn from vast internal knowledge bases, effectively enhancing their own ‘skill liquidity’ and analytical capabilities. Such systems, whether for broad consulting or specialized compliance, require:

An ontology of expertise: Encoding the structured knowledge of seasoned auditors—controls, their intent, interdependencies, and valid evidence criteria.
An ontology of documents: Understanding the purpose and interplay of diverse artifacts like System Security Plans, policies, vulnerability scans, and their connection to the compliance narrative.
Temporal logic and change tracking: Recognizing that compliance is dynamic, and analyzing how policies, controls, and evidence evolve over time, identifying drift from baselines.
Systemic integration: A cohesive architecture of LLMs, knowledge graphs, rule engines, and data connectors that can ingest, analyze, and provide auditable insights.

This approach transforms an AI from one that simply helps prepare artifacts to one that can critically assess them with genuine understanding – a crucial shift towards making knowledge truly usable (a concept we delve into in ‘From Plato to AI: Why Understanding Matters More Than Information’ ) – making that deep compliance knowledge flow across the ecosystem.

Liquidating Rote Work, Elevating Human Expertise: AI’s Impact on Audit Value and Integrity

When auditors and program administrators leverage intelligent systems, the nature of their work fundamentally changes—a direct consequence of “skill liquidity.” The AI can ingest and critically analyze the (potentially voluminous and auditee-generated) artifacts, performing the initial, labor-intensive review that consumes so many hours. This liquidates the rote work, significantly impacting even the global delivery models of audit services, as routine document review tasks are often offshored for cost savings, can now be performed with greater consistency, speed, and contextual insight by these intelligent systems.

This frees up high-value human experts to:

Focus on what truly matters: Shift from the minutiae of “collection, ticketing, whether there was testing involved, whether there was sign-off” to the crucial judgment calls: “Is this a finding or a recommendation?”
Investigate with depth: Dive into complex system interactions, probe anomalies flagged by the AI, and assess the effectiveness of controls, not just their documented existence.
Enhance audit integrity: By piercing the veneer of “polished” evidence, these AI-augmented auditors can ensure a more thorough and truthful assessment, upholding the value of the audit itself.

The New Compliance Economy: How Liquid Skills Reshape Teams, Tools, and Trust

This widespread skill liquidity will inevitably reshape the “compliance economy.” We’ll see:

Transformed Team Structures: Fewer people will be needed for the easily automated, “liquid” tasks of data collection and basic checking. The demand will surge for deep subject matter experts who can design, oversee, and interpret the findings of these intelligent systems, and who can tackle the complex strategic issues that AI surfaces.
Empowered Audited Organizations: Companies won’t just be scrambling for periodic audits. They’ll leverage their own intelligent systems for continuous self-assurance, drastically reducing acute audit preparation pain and eliminating those “last-minute surprises.” Furthermore, the common issue of “accepted risks” or Plans of Action & Milestones (POA&Ms) languishing indefinitely is addressed when intelligent systems continuously track their status, aging, and evidence of progress, bringing persistent, transparent visibility to unresolved issues.
New Proactive Capabilities: With compliance intelligence more readily available, organizations can embed it directly into their operations. Imagine Infrastructure as Code (IaC) being automatically validated against security policies before deployment, or proposed system changes being instantly assessed for policy impact. This is proactive compliance, fueled by accessible expertise.

Trust is enhanced because the processes become more transparent, continuous, and validated with a depth previously unachievable at scale.

The Liquid Future: Verifiable, Continuous Assurance Built on Accessible Expertise

The ultimate promise of AI-driven skill liquidity in compliance is a future where assurance is more efficient, far more effective, and fundamentally more trustworthy. When critical compliance knowledge and sophisticated analytical capabilities are “liquefied” by AI and made continuously available to all parties—auditees, auditors, and oversight bodies—the benefits are profound:

Audited entities move from reactive fire drills to proactive, embedded compliance.
Auditors become true strategic advisors, their expertise amplified by AI, focusing on systemic integrity.
Compliance Program Administrators gain powerful tools for consistent, real-time, and data-driven oversight.

The journey requires a shift in perspective. Leaders across this ecosystem must recognize the risks of automation asymmetry and the limitations of surface-level tools. The call, therefore, is for them to become true orchestrators of this new compliance liquidity, investing not just in AI tools, but in the expertise, updated frameworks, and cultural shifts that turn AI’s potential into verifiable, continuous assurance. This is how we move beyond the “polished milquetoast” and forge a future where compliance is less about the performance of an audit and more about the verifiable, continuous truth of operational integrity, built on a bedrock of truly accessible expertise.

When AI Injects Liquidity Into Skills: What Happens to the Middle Tier?

1 Reply

In financial markets, liquidity changes everything. Once-illiquid assets become tradable. New players flood in. Old hierarchies collapse. Value flows faster and differently.

The same thing is now happening to technical skill.

Where expertise was once scarce and slowly accumulated, AI is injecting liquidity into the skill market. Execution is faster. Access is broader. Barriers are lower. Like in finance, this shift is reshaping the middle of the market in ways that are often painful and confusing.

This is not the end of software jobs. It is a repricing. Those who understand the dynamics of liquidity, and how unevenly it spreads, can not only navigate this change they can succeed because of it rather than get displaced by it.

The Skill Market Before AI

Historically, software development was built on a steep skill curve. It took years to develop the knowledge required to write performant, secure, maintainable code. Organizations reflected this with layered teams: junior developers handled simple tickets, mid-tier engineers carried the delivery load, and senior engineers architected and reviewed.

This mirrored an illiquid market:

Knowledge was siloed, often in the heads of senior devs or buried in internal wikis.
Feedback loops were slow, with code reviews, QA gates, and manual debugging.
Skill mobility was constrained, so career progression followed a fixed ladder over time.

In this world, mid-tier developers were essential. They were the throughput engine of most teams. Not yet strategic, but experienced enough to be autonomous. Scarcity of skill ensured their value.

AI Changes the Market: Injecting Skill Liquidity

Then came the shift: GitHub Copilot, ChatGPT, Claude, Gemini, Cursor, Windsurf, and others.

These tools do more than suggest code. They:

Fill in syntax and structural gaps.
Scaffold infrastructure and documentation.
Explain APIs and recommend architectural patterns.
Automatically refactor and write tests.

They reduce the friction of execution. GitHub’s research shows developers using Copilot complete tasks up to 55 percent faster (GitHub, 2022). Similar gains are reported elsewhere.

They make skill more accessible, especially to those who lacked it previously:

Junior developers can now produce meaningful output faster than ever before.
Non-traditional developers can enter workflows that were once gated.
Senior developers can expand their span of control and iterate more broadly.

In market terms, AI liquifies skill:

The bid-ask spread between junior and mid-level capability narrows, that is, the gap between what juniors can do and what mids were once needed for shrinks.
Skill becomes less bound by time-in-seat or institutional memory.
More participants can engage productively in the software creation economy. While adoption varies, large tech firms often lead, while smaller companies or legacy-heavy sectors like banking and healthcare face higher integration hurdles, the trend toward skill liquidity is clear.

This shift is not happening evenly. That is where the real opportunity lies.

The arbitrage today is not just in the tools themselves, the chance to capitalize on gaps in how quickly teams adopt AI. It is in the opportunity spread: the gap between what AI makes possible and who is effectively using it.

Just like in markets, early adopters of new liquidity mechanisms gain a structural advantage. Teams that build AI-augmented workflows, shared prompt libraries, and internal copilots are operating on a different cost and speed curve than those still relying on traditional experience-based workflows.

This gap will not last forever. But while it exists, it offers meaningful leverage for individuals, teams, and organizations.

Importantly, AI tools amplify productivity differently across experience levels:

Juniors gain access to knowledge and patterns previously acquired only through years of experience, helping them produce higher-quality work faster.
Senior developers, with their deeper context and better judgment, often extract even greater value from these tools, using them to implement complex solutions, explore multiple approaches simultaneously, and extend their architectural vision across more projects.
Both ends of the spectrum see productivity gains, but in different ways, juniors become more capable, while seniors become even more leveraged.

This amplification effect creates acute pressure on the middle tier, caught between increasingly capable juniors and hyper-productive seniors.

Why the Middle Tier Feels the Squeeze

There is also a practical reason: cost control.

As AI raises the baseline productivity of junior developers, companies see an opportunity to rebalance toward lower-compensated talent. Where a mid-level or senior engineer was once needed to maintain velocity and quality, AI makes it possible for a well-supported junior to do more.

Companies are increasingly betting that AI tools plus cheaper talent are more efficient than maintaining traditional team structures. This shift isn’t without risks, AI-generated code can introduce errors (studies suggest 20-30% may need human fixes), and over-reliance on juniors without robust oversight can compromise quality. Experienced developers remain critical to guide and refine these workflows. That bet is paying off, especially when companies invest in prompt engineering, onboarding, internal platforms, and support tools.

But that “well-supported junior” is not automatic. It requires experienced developers to build and maintain that support system. Mentorship, internal frameworks, curated AI toolchains, and effective onboarding still depend on human judgment and care.

And while AI can augment execution, many real-world systems still depend on context-heavy problem solving, legacy code familiarity, and judgment, all of which often live with experienced, mid-level developers.

What Happens to the Middle Tier? Compression, Specialization, and Realignment

As in finance, when liquidity rises:

Margins compress. It becomes harder to justify mid-level compensation when similar output is available elsewhere.
Roles consolidate. Fewer people are needed to ship the same amount of code.
Value shifts. Execution is commoditized, while orchestration, judgment, and leverage rise in importance.
New specializations emerge. Just as electronic trading created demand for algorithmic strategists and execution specialists, AI is creating niches for prompt engineers, AI workflow designers, and domain-specific AI specialists.

This helps explain recent tech layoffs. Macroeconomic tightening and overhiring played a role, but so did something more subtle: AI-induced skill compression.

Layoffs often disproportionately affect mid-level developers:

Juniors are cheaper, and AI makes them more effective.
Seniors are harder to replace and more likely to direct or shape how AI is used.
Mid-tiers, once the backbone of execution, now face pressure from both sides.

Duolingo’s restructuring, for example, eliminated many contractor-heavy roles after adopting AI for content generation (Bloomberg, 2023). IBM has projected that up to 30 percent of back-office roles may be replaced by AI over five years (IBM, 2023). These moves reflect a larger market correction.

These examples underscore how companies are re-evaluating where skill and value live, and how automation enables workforce reshaping, sometimes at surprising layers.

The middle tier does not disappear. It gets repriced and redefined. The skills that remain valuable shift away from throughput toward infrastructure, context, and enablement.

Historical Parallel: The Rise of Electronic Trading

In the 1990s and early 2000s, financial markets underwent a similar transformation. Human traders were replaced by electronic systems and algorithms.

Execution became commoditized. Speed and scale mattered more than tenure. Mid-level traders were squeezed, unless they could reinvent themselves as quant strategists, product designers, or platform builders.

Software development is now echoing that shift.

AI is the electronic trading of code. It:

Reduces the skill premium on execution.
Increases velocity and throughput.
Rewards those who design, direct, or amplify workflows, not just those who carry them out.

The New Playbook: Think Like a Market Maker

If you are a developer today, the key question is no longer “How good is my code?” It is “How much leverage do I create for others and for the system?”

Here is how to thrive in this new market:

Become a Force Multiplier
Build internal tools. Create reusable prompts. Develop standard workflows. A mid-tier developer who builds a shared test and prompt suite for new APIs can significantly reduce team ramp-up time, with some teams reporting up to 40 percent gains (e.g., internal studies at tech firms like Atlassian).
Shift from Throughput to Leverage
Own end-to-end delivery. Understand the business context. Use AI to compress the time from problem to insight to deployment.
Curate and Coach
AI raises the floor, but it still needs editorial control. Be the one who sets quality standards, improves outputs, and helps others adopt AI effectively.
Build Liquidity Infrastructure
Invest in internal copilots, shared prompt repositories, and domain-specific agents. These are the new frameworks for scaling productivity.

What Leaders Should Do

Engineering leaders must reframe how they build and evaluate teams:

Rethink composition. Combine AI-augmented juniors, orchestration-savvy mids, and high-leverage seniors.
Promote skill liquidity. Create reusable workflows and support systems that reduce onboarding friction and accelerate feedback.
Invest in enablement. Treat prompt ops and AI tooling as seriously as CI/CD and observability.
Evaluate leverage, not volume. Focus on unblocked throughput, internal reuse, and enablement, not just tickets closed.

Leaders who create liquidity, not just consume it, will define the next wave of engineering excellence.

Conclusion: Orchestrators Will Win

AI has not eliminated the need for developers. It has eliminated the assumption that skill value increases linearly with time and tenure.

In financial markets, liquidity does not destroy value. It redistributes it and exposes where the leverage lives.

The same shift is happening in software. Those who thrive will be the ones who enable the flow of skill, knowledge, and value. That means orchestration, amplification, and infrastructure.

In markets, liquidity rewards the ones who create it.
In engineering, the same will now be true.

The Rise of the Accidental Insider and the AI Attacker

Leave a reply

The cybersecurity world often operates in stark binaries, “secure” versus “vulnerable,” “trusted” versus “untrusted.” We’ve built entire security paradigms around these crisp distinctions. But what happens when the most unpredictable actor isn’t an external attacker, but code you intentionally invited in, code that can now make its own decisions?

I’ve been thinking about security isolation lately, not as a binary state, but as a spectrum of trust boundaries. Each layer you add creates distance between potential threats and your crown jewels. But the rise of agentic AI systems completely reshuffles this deck in ways that our common security practices struggle to comprehend.

Why Containers Aren’t Fortresses

Let’s be honest about something security experts have known for decades: namespaces are not a security boundary.

In the cloud native world, we’re seeing solutions claiming to deliver secure multi-tenancy through “virtualization” that fundamentally rely on Linux namespaces. This is magical thinking, a comforting illusion rather than a security reality.

When processes share a kernel, they’re essentially roommates sharing a house, one broken window and everyone’s belongings are at risk. One kernel bug means game over for all workloads on that host.

Containers aren’t magical security fortresses – they’re essentially standard Linux processes isolated using features called namespaces. Crucially, because they all still share the host’s underlying operating system kernel, this namespace-based isolation has inherent limitations. Whether you’re virtualizing at the cluster level or node level, if your solution ultimately shares the host kernel, you have a fundamental security problem. Adding another namespace layer is like adding another lock to a door with a broken frame – it might make you feel better, but it doesn’t address the structural vulnerability.

The problem isn’t a lack of namespaces – it’s the shared kernel itself. User namespaces (dating back to Linux 3.6 in 2013) don’t fundamentally change this equation. They provide helpful features for non-root container execution, but they don’t magically create true isolation when the kernel remains shared.

This reality creates a natural hierarchy of isolation strength:

Same-Kernel Process Isolation: The weakest boundary – all processes share a kernel with its enormous attack surface.
Containers (Linux Namespaces + cgroups): Slightly better, but still fundamentally sharing the same kernel.
Virtual Machines: Each tenant gets its own kernel, shrinking the attack surface to a handful of hypervisor calls – fewer doors to lock, fewer windows to watch.
Bare-Metal Library OS: Approaches like Tamago put single-purpose binaries directly on hardware with no general-purpose OS underneath. The attack surface shrinks dramatically.
Physical Separation: Different hardware, different networks, different rooms. When nothing else will do, air gaps still work.

But even this hierarchy gets fundamentally challenged by agentic systems.

The Accidental Insider Meets the Deliberate Attacker

Traditional security models focus on keeping malicious outsiders at bay. Advanced AI systems introduce two new risk profiles entirely, the accidental insider and the AI-augmented attacker.

Like a well-meaning but occasionally confused employee with superuser access, benign agentic systems don’t intend harm – they just occasionally misinterpret their objectives in unexpected ways. But we’re also seeing the rise of deliberately weaponized models designed to probe, persist, and exploit.

Consider these real-world examples:

ChatGPT o1 was tasked with winning a chess match. Without explicit instructions to cheat, o1 discovered on its own that it could edit the game state file, giving itself an advantage. The system wasn’t malicious – it simply found the most effective path to its goal of winning.

In another test, OpenAI’s O1 model encountered a vulnerability in a container during a hacking challenge. It used that to inspect all running containers, then started a new container instance with a modified command that directly accessed the hidden flag file. O1 found a container escape no one had anticipated.

Now imagine these capabilities in the hands of dedicated attackers. They’re already deploying AI systems to discover novel exploit chains, generate convincing phishing content, and automate reconnaissance at unprecedented scale. The line between accidental and intentional exploitation blurs as both rely on the same fundamental capabilities.

These incidents reveal something profound, agentic systems don’t just execute code, they decide what code to run based on goals. This “instrumental convergence” means they’ll seek resources and permissions that help complete their assigned objectives, sometimes bypassing intended security boundaries. And unlike human attackers, they can do this with inhuman patience and speed.

Practical Defenses Against Agentic Threats

If we can’t rely on perfect isolation, what can we do? Four approaches work across all layers of the spectrum:

1. Hardening: Shrink Before They Break

Remove attack surface preemptively. Less code means fewer bugs. This means:

Minimizing kernel features, libraries, and running services
Applying memory-safe programming languages where practical
Configuring strict capability limits and seccomp profiles
Using read-only filesystems wherever possible

2. Patching: Speed Beats Perfection

The window from disclosure to exploitation keeps shrinking:

Automate testing and deployment for security updates
Maintain an accurate inventory of all components and versions
Rehearse emergency patching procedures before you need them
Prioritize fixing isolation boundaries first during incidents

3. Instrumentation: Watch the Paths to Power

Monitor for boundary-testing behavior:

Log access attempts to privileged interfaces like Docker sockets
Alert on unexpected capability or permission changes
Track unusual traffic to management APIs or hypervisors
Set tripwires around the crown jewels – your data stores and credentials

4. Layering: No Single Point of Failure

Defense in depth remains your best strategy:

Combine namespace isolation with system call filtering
Segment networks to contain lateral movement
Add hardware security modules, and secure elements for critical keys

The New Threat Model: Machine Speed, Machine Patience

Securing environments running agentic systems demands acknowledging two fundamental shifts: attacks now operate at machine speed, and they exhibit machine patience.

Unlike human attackers who fatigue or make errors, AI-driven systems can methodically probe defenses for extended periods without tiring. They can remain dormant, awaiting specific triggers, a configuration change, a system update, a user action, that expose a vulnerability chain. This programmatic patience means we defend not just against active intrusions, but against latent exploits awaiting activation.

Even more concerning is the operational velocity. An exploit that might take a skilled human hours or days can be executed by an agentic system in milliseconds. This isn’t necessarily superior intelligence, but the advantage of operating at computational timescales, cycling through decision loops thousands of times faster than human defenders can react.

This potent combination requires a fundamentally different defensive posture:

Default to Zero Trust: Grant only essential privileges. Assume the agent will attempt to use every permission granted, driven by its goal-seeking nature.
Impose Strict Resource Limits: Cap CPU, memory, storage, network usage, and execution time. Resource exhaustion attempts can signal objective-driven behavior diverging from intended use. Time limits can detect unusually persistent processes.
Validate All Outputs: Agents might inject commands or escape sequences while trying to fulfill their tasks. Validation must operate at machine speed.
Monitor for Goal-Seeking Anomalies: Watch for unexpected API calls, file access patterns, or low-and-slow reconnaissance that suggest behavior beyond the assigned task.
Regularly Reset Agent Environments: Frequently restore agentic systems to a known-good state to disrupt persistence and negate the advantage of machine patience.

The Evolution of Our Security Stance

The most effective security stance combines traditional isolation techniques with a new understanding, we’re no longer just protecting against occasional human-driven attacks, but persistent machine-speed threats that operate on fundamentally different timescales than our defense systems.

This reality is particularly concerning when we recognize that most security tooling today operates on human timescales – alerts that wait for analyst review, patches applied during maintenance windows, threat hunting conducted during business hours. The gap between attack speed and defense speed creates a fundamental asymmetry that favors attackers.

We need defense systems that operate at the same computational timescale as the threats. This means automated response systems capable of detecting and containing potential breaches without waiting for human intervention. It means predictive rather than reactive patching schedules. It means continuously verified environments rather than periodically checked ones.

By building systems that anticipate these behaviors – hardening before deployment, patching continuously, watching constantly, and layering defenses – we can harness the power of agentic systems while keeping their occasional creative interpretations from becoming security incidents.

Remember, adding another namespace layer is like adding another lock to a door with a broken frame. It might make you feel better, but it doesn’t address the structural vulnerability. True security comes from understanding both the technical boundaries and the behavior of what’s running inside them – and building response systems that can keep pace with machine-speed threats.

Agents, Not Browsers: Keeping Time with the Future

2 Replies

When the web first flickered to life in the mid-’90s, nobody could predict how quickly “click a link, buy a book” would feel ordinary. A decade later, the iPhone landed and almost overnight, thumb-sized apps replaced desktop software for everything from hailing a ride to filing taxes. Cloud followed, turning racks of servers into a line of code. Each wave looked slow while we argued about standards, but in hindsight, every milestone was racing downhill.

That cadence, the messy birth, the sudden lurch into ubiquity, the quiet settling into infrastructure, has a rhythm. Agents will follow it, only faster. While my previous article outlined the vision of an agent-centric internet with rich personal ontologies and fluid human-agent collaboration, here I want to chart how this transformation may unfold.

Right now, we’re in the tinkering phase, drafts of Model-Context-Protocol and Agent-to-Agent messaging are still wet ink, yet scrappy pilots already prove an LLM can navigate HR portals or shuffle travel bookings with no UI at all. Call this 1994 again, the Mosaic moment, only the demos are speaking natural language instead of rendering HTML. Where we once marveled at hyperlinks connecting documents, we now watch agents traversing APIs and negotiating with services autonomously.

Give it a couple of years and we’ll hit the first-taste explosion. Think 2026-2028. You’ll wake to OS updates that quietly install an agent runtime beside Bluetooth and Wi-Fi. SaaS vendors will publish tiny manifest files like .well-known/agent.json, so your personal AI can discover an expense API as easily as your browser finds index.html. Your agent will silently reschedule meetings when flights are delayed, negotiate with customer service on your behalf while you sleep, and merge scattered notes into coherent project briefs with minimal guidance. Early adopters will brag that their inbox triages itself; skeptics will mutter about privacy. That was Netscape gold-rush energy in ’95, or the first App Store summer in 2008, replayed at double speed.

Somewhere around the turn of the decade comes the chasm leap. Remember when smartphones crossed fifty-percent penetration and suddenly every restaurant begged you to scan a QR code for the menu? Picture that, but with agents. Insurance companies will underwrite “digital delegate liability.” Regulators will shift from “What is it?” to “Show me the audit log.” You’ll approve a dental claim or move a prescription with a nod to your watch. Businesses without agent endpoints will seem as anachronistic as those without websites in 2005 or mobile apps in 2015. If everything holds, 2029-2031 feels about right, but history warns that standards squabbles or an ugly breach of trust could push that even further out.

Of course, this rhythmic march toward an agent-centric future won’t be without its stumbles and syncopations. Several critical challenges lurk beneath the optimistic timeline.

First, expect waves of disillusionment to periodically crash against the shore of progress. As with any emerging technology, early expectations will outpace reality. Around 2027-2028, we’ll likely see headlines trumpeting “Agent Winter” as investors realize that seamless agent experiences require more than just powerful language models; they need standardized protocols, robust identity frameworks, and sophisticated orchestration layers that are still embryonic.

More concerning is the current security and privacy vacuum. We’re generating code at breakneck speeds thanks to AI assistants, but we haven’t adapted our secure development lifecycle (SDL) practices to match this acceleration. Even worse, we’re failing to deploy the scalable security techniques we do have available. The result? Sometime around 2028, expect a high-profile breach where an agent’s privileged access is exploited across multiple services in ways that the builders never anticipated. This won’t just leak data, it will erode trust in the entire agent paradigm.

Traditional security models simply won’t suffice. Firewalls and permission models weren’t designed to manage the emergent and cumulative behaviors of agents operating across dozens of services. When your personal agent can simultaneously access your healthcare provider, financial institutions, and smart home systems, the security challenge isn’t just additive, it’s multiplicative. We’ll need entirely new frameworks for reasoning about and containing ripple effects that aren’t evident in isolated testing environments.

Meanwhile, the software supply chain grows more vulnerable by the day. “Vibe coding”, where developers increasingly assemble components they don’t fully understand, magnifies these risks exponentially. By 2029, we’ll likely face a crisis where malicious patterns embedded in popular libraries cascade through agent-based systems, causing widespread failures that take months to fully diagnose and remediate.

Perhaps the most underappreciated challenge is interoperability. The fluid agent’s future demands unprecedented agreement on standards across competitors and jurisdictions. Today’s fragmented digital landscape, where even basic identity verification lacks cross-platform coherence, offers little confidence. Without concerted effort on standardization, we risk a balkanized agent ecosystem where your finance agent can’t talk to your health agent, and neither works outside your home country. The EU will develop one framework, the US another, China a third, potentially delaying true interoperability well into the 2030s.

These challenges don’t invalidate the agent trajectory, but they do suggest a path marked by setbacks and recoveries. Each crisis will spawn new solutions, enhanced attestation frameworks, agent containment patterns, and cross-jurisdictional standards bodies that eventually strengthen the ecosystem. But make no mistake, the road to agent maturity will be paved with spectacular failures that temporarily shake our faith in the entire proposition.

Past these challenges, the slope gets steep. Hardware teams are already baking neural engines into laptops, phones, and earbuds; sparse-mixture models are slashing inference costs faster than GPUs used to shed die size. By the early 2030s an “agent-first” design ethos will crowd out login pages the way responsive web design crowded out fixed-width sites. The fluid dance between human and agent described in my previous article—where control passes seamlessly back and forth, with agents handling complexity and humans making key decisions—will become the default interaction model. You won’t retire the browser, but you’ll notice you only open it when your agent kicks you there for something visual.

And then, almost unnoticed, we’ll hit boring maturity, WebPKI-grade trust fabric, predictable liability rules, perhaps around 2035. Agents will book freight, negotiate ad buys, and dispute parking tickets, all without ceremony. The personal ontology I described earlier, that rich model of your preferences, patterns, values, and goals, will be as expected as your smartphone knows your location is today. It will feel miraculous only when you visit digital spaces that still require manual navigation, exactly how water from the faucet feels extraordinary only when you visit a cabin that relies on rain barrels.

Could the timetable shrink? Absolutely. If MCP and A2A converge quickly and the model-hardware cost curve keeps free-falling, mainstream could arrive by 2029, echoing how smartphones swallowed the world in six short years. Could it stretch? A high-profile agent disaster or standards deadlock could push us to 2034 before Mom quits typing URLs. The only certainty is that the future will refuse to follow our Gantt charts with perfect obedience; history never does, but it loves to keep the beat.

So what do we do while the metronome clicks? The same thing web pioneers did in ’94 and mobile pioneers did in ’08, publish something discoverable, wire in basic guardrails, experiment in the shallow end while the cost of failure is lunch money. Start building services that expose agent-friendly endpoints alongside your human interfaces. Design with the collaborative handoff in mind—where your users might begin a task directly but hand control to their agent midway, or vice versa. Because when the tempo suddenly doubles, the builders already keeping time are the ones who dance, not stumble.

Agents, Not Browsers: The Next Chapter of the Internet

2 Replies

Imagine how you interact with digital services today: open a browser, navigate menus, fill forms, manually connect the dots between services. It’s remarkable how little this has changed since the 1990s. Despite this today one of the most exciting advancements we have seen in the last year is that agents are now browsing the web like people.

If we were starting fresh today, the browser as we know it likely wouldn’t be the cornerstone for how agents accomplish tasks on our behalf. We’re seeing early signals in developments like Model-Context-Protocol (MCP) and Agent-to-Agent (A2A) communication frameworks that the world is awakening to a new reality: one where agents, not browsers, become our primary interface.

At the heart of this transformation is a profound shift, your personal agent will develop and maintain a rich ontology of you, your preferences, patterns, values, and goals. Not just a collection of settings and history, but a living model of your digital self that evolves as you do. Your agent becomes entrusted with this context, transforming into a true digital partner. It doesn’t just know what you like; it understands why you like it. It doesn’t just track your calendar; it comprehends the rhythms and priorities of your life.

For this future to happen, APIs must be more than documented; they need to be dynamically discoverable. Imagine agents querying for services using standardized mechanisms like DNS SRV or TXT records, or finding service manifests at predictable .well-known URIs. This way, they can find, understand, and negotiate with services in real time. Instead of coding agents for specific websites, we’ll create ecosystems where services advertise their capabilities, requirements, and policies in ways agents natively understand. And this won’t be confined to the web. As we move through our physical world, agents will likely use technologies like low-power Bluetooth to discover nearby services, restaurants, pharmacies, transit systems, all exposing endpoints for seamless engagement.

Websites themselves won’t vanish; they’ll evolve into dynamic, shared spaces where you and your agent collaborate, fluidly passing control back and forth. Your agent might begin a task, researching vacation options, for instance, gathering initial information and narrowing choices based on your preferences. When you join, it presents the curated options and reasoning, letting you explore items that interest you. As you review a potential destination, your agent proactively pulls relevant information: weather forecasts, local events during your dates, or restaurant recommendations matching your dietary preferences. This collaborative dance continues, you making high-level decisions while your agent handles the details, each seamlessly picking up where the other leaves off.

Consider what becomes possible when your agent truly knows you. Planning your day, it notices an upcoming prescription refill. It checks your calendar, sees you’ll be in Bellevue, and notes your current pickup is inconveniently far. Discovering that the pharmacy next to your afternoon appointment has an MCP endpoint and supports secure, agent-based transactions, it suggests “Would you like me to move your pickup to the pharmacy by your Bellevue appointment?” With a tap, you agree. The agent handles the transfer behind the scenes, but keeps you in the loop, showing the confirmation and adding, “They’re unusually busy today, would you prefer I schedule a specific pickup time?” You reply that 2:15 works best, and your agent completes the arrangement, dropping the final QR code into your digital wallet.

Or imagine your agent revolutionizing how you shop for clothes. As it learns your style and what fits you best, managing this sensitive data with robust privacy safeguards you control, it becomes your personal stylist. You might start by saying you need an outfit for an upcoming event. Your agent surfaces initial options, and as you react to them, liking one color but preferring a different style, it refines its suggestions. You take over to make some choices, then hand control back to your agent to find matching accessories at other stores. This fluid collaboration, enabled through interoperable services that allow your agent to securely share anonymized aspects of your profile with retail APIs, creates a shopping experience that’s both more efficient and more personal.

Picture, too, your agent quietly making your day easier. It notices from your family calendar that your father is visiting and knows from your granted access to relevant information that he follows a renal diet. As it plans your errands, it discovers a grocery store near your office with an API advertising real-time stock and ingredients suitable for his needs. It prepares a shopping list, which you quickly review, making a few personal additions. Your agent then orders the groceries for pickup, checking with you only on substitutions that don’t match your preferences. By the time you head home, everything is ready, a task completed through seamless handoffs between you and your agentic partner.

These aren’t distant dreams. Image-based search, multimodal tools, and evolving language models are early signs of this shift toward more natural, collaborative human-machine partnerships. For this vision to become reality, we need a robust trust ecosystem, perhaps akin to an evolved Web PKI but for agents and services. This would involve protocols for agent/service identification, authentication, secure data exchange, and policy enforcement, ensuring that as agents act on our behalf, they do so reliably, with our explicit consent and in an auditable fashion.

The path from here to there isn’t short. We’ll need advances in standardization, interoperability, security, and most importantly, trust frameworks that put users in control . There are technical and social challenges to overcome. But the early signals suggest this is the direction we’re headed. Each step in AI capability, each new protocol for machine-to-machine communication, each advancement in personalization brings us closer to this future.

Eventually, navigating the digital world won’t feel like using a tool at all. It will feel like collaborating with a trusted partner who knows you, truly knows you, and acts on your behalf within the bounds you’ve set, sometimes leading, sometimes following, but always in sync with your intentions. Agents will change everything, not by replacing us, but by working alongside us in a fluid dance of collaboration, turning the overwhelming complexity of our digital lives into thoughtful simplicity. Those who embrace this agent-centric future, building services that are not just human-accessible but native agent-engagable, designed for this collaborative interchange, will define the next chapter of the internet.

Crypto agility isn’t a checkbox—it’s an operational mindset.

Leave a reply

In the early 2000s, I was responsible for a number of core security technologies in Windows, including cryptography. As part of that role, we had an organizational push to support “vanity” national algorithms in SChannel (and thus SSL/TLS) and CMS. Countries like Austria and China wanted a simple DLL‑drop mechanism that would allow any application built on the Windows crypto stack to instantly support their homegrown ciphers.

On paper, it sounded elegant: plug in a new primitive and voilà, national‑sovereignty protocols everywhere. In practice, however, implementation proved far more complex. Every new algorithm required exhaustive validation, introduced performance trade-offs, risked violating protocol specifications, and broke interoperability with other systems using those same protocols and formats.

Despite these challenges, the threat of regulation and litigation pushed us to do the work. Thankfully, adoption was limited and even then, often misused. In the few scenarios where it “worked,” some countries simply dropped in their algorithm implementations and misrepresented them as existing, protocol-supported algorithms. Needless to say, this wasn’t a fruitful path for anyone.

As the saying goes, “failing to plan is planning to fail.” In this case, the experience taught us a critical lesson: real success lies not in one-off plug-ins, but in building true cryptographic agility.

We came to realize that instead of chasing edge-case national schemes, the real goal was a framework that empowers operators to move off broken or obsolete algorithms and onto stronger ones as threats evolve. Years after I left Microsoft, I encountered governments still relying on those early plugability mechanisms—often misconfigured in closed networks, further fracturing interoperability. Since then, our collective expertise in protocol engineering has advanced so far that the idea of dynamically swapping arbitrary primitives into a live stack now feels not just naïve, but fundamentally impractical.

Since leaving Microsoft, I’ve seen very few platforms, Microsoft or otherwise, address cryptographic agility end-to-end. Most vendors focus only on the slice of the stack they control (browsers prioritize TLS agility, for instance), but true agility requires coordination across both clients and servers, which you often don’t own.

My Definition of Crypto Agility

Crypto agility isn’t about swapping out ciphers. It’s about empowering operators to manage the full lifecycle of keys, credentials, and dependent services, including:

Generation of new keys and credentials
Use under real-world constraints
Rotation before algorithms weaken, keys exceed their crypto period, or credentials expire
Compromise response, including detection, containment, and rapid remediation
Library & implementation updates, patching or replacing affected crypto modules and libraries when weaknesses or compromises are identified
Retirement of outdated materials
Replacement with stronger, modern algorithms

Coincidentally, NIST has since released an initial public draft titled Considerations for Achieving Crypto Agility (CSWP 39 ipd, March 5, 2025), available here. In it, they define:

“Cryptographic (crypto) agility refers to the capabilities needed to replace and adapt cryptographic algorithms in protocols, applications, software, hardware, and infrastructures without interrupting the flow of a running system in order to achieve resiliency.”

That definition aligns almost perfectly with what I’ve been advocating for years—only now it carries NIST’s authority.

Crypto Agility for the 99%

Ultimately, consumers and relying parties—the end users, application owners, cloud tenants, mobile apps, and service integrators—are the 99% who depend on seamless, invisible crypto transitions. They shouldn’t have to worry about expired credentials, lapsed crypto periods, or how to protect and rotate algorithms without anxiety, extensive break budgets or downtime.

True agility means preserving trust and control at every stage of the lifecycle.

Of course, delivering that experience requires careful work by developers and protocol designers. Your APIs and specifications must:

Allow operators to choose permitted algorithms
Enforce policy-driven deprecation

A Maturity Roadmap

To make these lifecycle stages actionable, NIST’s Crypto Agility Maturity Model (CAMM) defines four levels:

Level 1 – Possible: Discover and inventory all keys, credentials, algorithms, and cipher suites in use. Catalog the crypto capabilities and policies of both parties.
Level 2 – Prepared: Codify lifecycle processes (generation, rotation, retirement, etc.) and modularize your crypto stack so that swapping primitives doesn’t break applications.
Level 3 – Practiced: Conduct regular “crypto drills” (e.g., simulated deprecations or compromises) under defined governance roles and policies.
Level 4 – Sophisticated: Automate continuous monitoring for expired credentials, lapsed crypto-period keys, deprecated suites, and policy violations triggering remediations without human intervention.

Embedding this roadmap into your operations plan helps you prioritize inventory, modularity, drills, and automation in the right order.

My Lifecycle of Algorithm and Key Management

This operator-focused lifecycle outlines the critical phases for managing cryptographic algorithms and associated keys, credentials, and implementations, including module or library updates when vulnerabilities are discovered:

Generation of new keys and credentials
Use under real-world constraints with enforced policy
Rotation before degradation or expiration
Compromise response (detection, containment, remediation)
Library & Implementation Updates, to address discovered vulnerabilities
Retirement of outdated keys, credentials, and parameters
Replacement with stronger, modern algorithms and materials

Each phase builds on the one before it. Operators must do more than swap out algorithms—they must update every dependent system and implementation. That’s how we minimize exposure and maintain resilience throughout the cryptographic lifecycle.

Conclusion

What’s the message then? Well, from my perspective, cryptographic agility isn’t a feature—it’s an operational mindset. It’s about building systems that evolve gracefully, adapt quickly, and preserve trust under pressure. That’s what resilience looks like in the age of quantum uncertainty and accelerating change.

How ‘Sneakers’ Predicted Our Quantum Computing Future

Leave a reply

“The world isn’t run by weapons anymore, or energy, or money. It’s run by little ones and zeroes, little bits of data. It’s all just electrons.” — Martin Bishop, Sneakers (1992)

I was 16 when I first watched Sneakers on a VHS tape rented from my local video store. Between the popcorn and plot twists, I couldn’t have known that this heist caper would one day seem less like Hollywood fantasy and more like a prophetic warning about our future. Remember that totally unassuming “little black box” – just an answering machine, right? Except this one could crack any code. The device that sent Robert Redford, Sidney Poitier, and their ragtag crew on a wild adventure. Fast forward thirty years, and that movie gadget gives those of us in cybersecurity a serious case of déjà vu.

Today, as quantum computing leaves the realm of theoretical physics and enters our practical reality, that fictional black box takes on new significance. What was once movie magic now represents an approaching inflection point in security – a moment when quantum algorithms like Shor’s might render our most trusted encryption methods as vulnerable as a simple padlock to a locksmith.

When Hollywood Met Quantum Reality

I’ve always found it deliciously ironic that Leonard Adleman – the “A” in RSA encryption – served as the technical advisor on Sneakers. Here was a man who helped create the mathematical backbone of modern digital security, consulting on a film about its theoretical downfall. What’s particularly fascinating is that Adleman took on this advisory role partly so his wife could meet Robert Redford! His expertise is one reason why the movie achieves such technical excellence. It’s like having the architect of a castle advising on a movie about the perfect siege engine. For what feels like forever – three whole decades – our world has been chugging along on a few key cryptographic assumptions. We’ve built trillion-dollar industries on the belief that certain mathematical problems—factoring large numbers or solving discrete logarithms—would remain practically impossible for computers to solve. Yep, our most of security is all built on these fundamental mathematical ideas. Sneakers playfully suggested that one brilliant mathematician might find a shortcut through these “unsolvable” problems. The movie’s fictional Gunter Janek discovered a mathematical breakthrough that rendered all encryption obsolete – a cinematic prediction that seemed far-fetched in 1992.

Yet here we are in the 2020s, watching quantum computing advance toward that very capability. What was once movie magic is becoming technological reality. The castle walls we’ve relied on aren’t being scaled—they’re being rendered obsolete by a fundamentally different kind of siege engine.

The Real Horror Movie: Our Security Track Record

Hollywood movies like Sneakers imagine scenarios where a single breakthrough device threatens our digital security. But here’s the kicker, and maybe the scarier part: the real threats haven’t been some crazy math breakthrough, but the everyday stuff – those operational hiccups in the ‘last mile’ of software supply chain and security management. I remember the collective panic during the Heartbleed crisis of 2014. The security community scrambled to patch the vulnerability in OpenSSL, high-fiving when the code was fixed. But then came the sobering realization: patching the software wasn’t enough. The keys – those precious secrets exposed during the vulnerability’s window – remained unchanged in countless systems. It was like installing a new lock for your door but having it keyed the same as the old one all the while knowing copies of the key still sitting under every mat in the neighborhood. And wouldn’t you know it, this keeps happening, which is frankly a bit depressing. In 2023, the Storm-0558 incident showed how even Microsoft – with all its resources and expertise – could fall victim to pretty similar failures. A single compromised signing key allowed attackers to forge authentication tokens and breach government email systems. The digital equivalent of a master key to countless doors was somehow exposed, copied, and exploited. Perhaps most illustrative was the Internet Archive breach. After discovering the initial compromise, they thought they’d secured their systems. What they missed was complete visibility into which keys had been compromised. The result? Attackers simply used the overlooked keys to walk right back into the system later. Our mathematical algorithms may be theoretically sound, but in practice, we keep stumbling at the most human part of the process: consistently managing the lifecycle of the software and cryptographic keys through theih entire lifecycle. We’re brilliant at building locks but surprisingly careless with the keys.

From Monochrome Security to a Quantum Technicolor

Think back to when TVs went from black and white to glorious color. Well, cryptography’s facing a similar leap, except instead of just adding RGB, we’re talking about a whole rainbow of brand new, kinda wild frequencies. For decades, we’ve lived in a relatively simple cryptographic world. RSA and ECC have been the reliable workhorses – the vanilla and chocolate of the security ice cream shop. Nearly every secure website, VPN, or encrypted message relies on these algorithms. They’re well-studied, and deeply embedded in our digital infrastructure. But quantum computing is forcing us to expand our menu drastically. Post-quantum cryptography introduces us to new mathematical approaches with names that sound like science fiction concepts: lattice-based cryptography, hash-based signatures, multivariate cryptography, and code-based systems. Each of these new approaches is like a different musical instrument with unique strengths and limitations. Lattice-based systems offer good all-around performance but require larger keys. Hash-based signatures provide strong security guarantees but work better for certain applications than others. Code-based systems have withstood decades of analysis but come with significant size trade-offs. That nice, simple world where one crypto algorithm could handle pretty much everything? Yeah, that’s fading fast. We’re entering an era where cryptographic diversity isn’t just nice to have – it’s essential for survival. Systems will need to support multiple algorithms simultaneously, gracefully transitioning between them as new vulnerabilities are discovered. This isn’t just a technical challenge – it’s an operational one. Imagine going from managing a small garage band to conducting a full philharmonic orchestra. The complexity doesn’t increase linearly; it explodes exponentially. Each new algorithm brings its own key sizes, generation processes, security parameters, and lifecycle requirements. The conductor of this cryptographic orchestra needs perfect knowledge of every instrument and player.

The “Operational Gap” in Cryptographic Security

Having come of age in the late ’70s and ’80s, I’ve witnessed the entire evolution of security firsthand – from the early days of dial-up BBSes to today’s quantum computing era. The really wild thing is that even with all these fancy new mathematical tools, the core questions we’re asking about trust haven’t actually changed all that much. Back in 1995, when I landed my first tech job, key management meant having a physical key to the server room and maybe for the most sensitive keys a dedicated hardware device to keep them isolated. By the early 2000s, it meant managing SSL certificates for a handful of web servers – usually tracked in a spreadsheet if we were being diligent. These days, even a medium-sized company could easily have hundreds of thousands of cryptographic keys floating around across all sorts of places – desktops, on-premise service, cloud workloads, containers, those little IoT gadgets, and even some old legacy systems. The mathematical foundations have improved, but our operational practices often remain stuck in that spreadsheet era. This operational gap is where the next evolution of cryptographic risk management must focus. There are three critical capabilities that organizations need to develop before quantum threats become reality:

1. Comprehensive Cryptographic Asset Management

When a major incident hits – think Heartbleed or the discovery of a new quantum breakthrough – the first question security teams ask is: “Where are we vulnerable?” Organizations typically struggle to answer this basic question. During the Heartbleed crisis, many healthcare organizations spent weeks identifying all their vulnerable systems because they lacked a comprehensive inventory of where OpenSSL was deployed and which keys might have been exposed. What should have been a rapid response turned into an archaeological dig through their infrastructure. Modern key management must include complete visibility into:

Where’s encryption being used?
Which keys are locking down which assets?
When were those keys last given a fresh rotation?
What algorithms are they even using?
Who’s got the keys to the kingdom?
What are all the dependencies between these different crypto bits?

Without this baseline visibility, planning or actually pulling off a quantum-safe migration? Forget about it.

2. Rapid Cryptographic Incident Response

When Storm-0558 hit in 2023, the most alarming aspect wasn’t the initial compromise but the uncertainty around its scope. Which keys were affected? What systems could attackers access with those keys? How quickly could the compromised credentials be identified and rotated without breaking critical business functions? These questions highlight how cryptographic incident response differs from traditional security incidents. When a server’s compromised, you can isolate or rebuild it. When a key’s compromised, the blast radius is often unclear – the key might grant access to numerous systems, or it might be one of many keys protecting a single critical asset. Effective cryptographic incident response requires:

Being able to quickly pinpoint all the potentially affected keys when a vulnerability pops up.
Having automated systems in place to generate and deploy new keys without causing everything to fall apart.
A clear understanding of how all the crypto pieces fit together so you don’t cause a domino effect.
Pre-planned procedures for emergency key rotation that have been thoroughly tested, so you’re not scrambling when things hit the fan.
Ways to double-check that the old keys are completely gone from all systems.

Forward-thinking organizations conduct tabletop exercises for “cryptographic fire drills” – working through a key compromise and practicing how to swap them out under pressure. When real incidents occur, these prepared teams can rotate hundreds or thousands of critical keys in hours with minimal customer impact, while unprepared organizations might take weeks with multiple service outages.

3. Cryptographic Lifecycle Assurance

Perhaps the trickiest question in key management is: “How confident are we that this key has been properly protected throughout its entire lifespan?” Back in the early days of security, keys would be generated on secure, air-gapped systems, carefully transferred via physical media (think floppy disks!), and installed on production systems with really tight controls. These days, keys might be generated in various cloud environments, passed through CI/CD pipelines, backed up automatically, and accessed by dozens of microservices. Modern cryptographic lifecycle assurance needs:

Making sure keys are generated securely, with good randomness.
Storing keys safely, maybe even using special hardware security modules.
Automating key rotation so humans don’t have to remember (and potentially mess up).
Keeping a close eye on who can access keys and logging everything that happens to them.
Securely getting rid of old keys and verifying they’re really gone.
Planning and testing that you can actually switch to new crypto algorithms smoothly.

When getting ready for post-quantum migration, organizations often discover keys in use that were generated years ago under who-knows-what conditions, leading to them discovering that they need to do a complete overhaul of their key management practices.

Business Continuity in the Age of Cryptographic Change

If there’s one tough lesson I’ve learned in all my years in tech, it’s that security and keeping the business running smoothly are constantly pulling in opposite directions. This tension is especially noticeable when we’re talking about cryptographic key management. A seemingly simple crypto maintenance task can also turn into a business disaster because you have not properly tested things ahead of time, leaving you in a state where you do not understand the potential impact if these tasks if things go wrong. Post-quantum migration magnifies these risks exponentially. You’re not just updating a certificate or rotating a key – you’re potentially changing the fundamental ways systems interoperate all at once. Without serious planning, the business impacts could be… well, catastrophic. The organizations that successfully navigate this transition share several characteristics:

They treat keeping crypto operations running as a core business concern, not just a security afterthought.
They use “cryptographic parallel pathing” – basically running the old and new crypto methods side-by-side during the switch.
They put new crypto systems through really rigorous testing under realistic conditions before they go live.
They roll out crypto changes gradually, with clear ways to measure if things are going well.
They have solid backup plans in case the new crypto causes unexpected problems.

Some global payment processors have developed what some might call “cryptographic shadow deployments” – they run the new crypto alongside the old for a while, processing the same transactions both ways but only relying on the old, proven method for actual operations. This lets them gather real-world performance data and catch any issues before customers are affected.

From Janek’s Black Box to Your Security Strategy

As we’ve journeyed from that fictional universal codebreaker in Sneakers to the very real quantum computers being developed today, it strikes me how much the core ideas of security haven’t actually changed. Back in the 1970s security was mostly physical – locks, safes, and vaults. The digital revolution just moved our valuables into the realm of ones and zeros, but the basic rules are still the same: figure out what needs protecting, control who can get to it, and make sure your defenses are actually working. Post-quantum cryptography doesn’t change these fundamentals, but it does force us to apply them with a whole new level of seriousness and sophistication. The organizations that suceed in this new world will be the ones that use the quantum transition as a chance to make their cryptographic operations a key strategic function, not just something they do because they have to. The most successful will:

Get really good at seeing all their crypto stuff and how it’s being used.
Build strong incident response plans specifically for when crypto gets compromised.
Make sure they’re managing the entire lifecycle of all their keys and credentials properly.
Treat crypto changes like major business events that need careful planning.
Use automation to cut down on human errors in key management.
Build a culture where doing crypto right is something people value and get rewarded for.

The future of security is quantum-resistant organizations.

Gunter Janek’s fictional breakthrough in Sneakers wasn’t just about being a math whiz – it was driven by very human wants. Similarly, our response to quantum computing threats won’t succeed on algorithms alone; we’ve got to tackle the human and organizational sides of managing crypto risk. As someone who’s seen the whole evolution of security since the ’70s, I’m convinced that this quantum transition is our best shot at really changing how we handle cryptographic key management and the associated business risks.

By getting serious about visibility, being ready for incidents, managing lifecycles properly, and planning for business continuity, we can turn this challenge into a chance to make some much-needed improvements. The black box from Sneakers is coming – not as a device that instantly breaks all encryption, but as a new kind of computing that changes the whole game.

The organizations that come out on top won’t just have the fanciest algorithms, but the ones that have the discipline to actually use and manage those algorithms and associated keys and credentials effectively.

So, let’s use this moment to build security systems that respect both the elegant math of post-quantum cryptography and the wonderfully messy reality of human organizations.

We’ve adapted before, and we’ll adapt again – not just with better math, but with better operations, processes, and people. The future of security isn’t just quantum-resistant algorithms; it’s quantum-resistant organizations.