Two things happened this week that, taken together, tell the entire story of where AI is heading.
First: Anthropic open-sourced the Agent Skills specification, making it an industry standard for defining how AI agents perform specific tasks. Skills are curated, versioned folders containing instructions, resources, and scripts that tell a model how to do a particular job well.
Second: the Pentagon announced it’s integrating xAI’s Grok chatbot into a military AI platform. This was met with immediate backlash about reliability, bias, and the appropriateness of putting a consumer chatbot into command-and-control systems.
One of these stories is about building the skill layer. The other is about what happens when you skip it entirely.
The Model Obsession
The AI industry is obsessed with models. Which model is best? What’s the latest benchmark? Should I use Claude or GPT or Gemini? These debates consume enormous amounts of attention and produce surprisingly little value.
Here’s why: the difference between frontier models on a well-defined production task is usually marginal. The difference between a model with a good skill layer and a model without one is enormous.
A skill is the set of instructions, context, examples, and constraints that tells a model how to do a specific job. Prompt engineering, guardrails, few-shot examples, tool configurations, and validation logic, all packaged together in a way that’s testable, versionable, and shareable.
When you deploy a bare model against a task, you get variable results. When you deploy a model with a well-crafted skill, you get consistent, high-quality results. The skill is the difference between a talented generalist and a trained specialist.
I’ve seen this pattern before in different forms. At Amazon, Alexa’s skill platform was the thing that made the hardware useful. The wake word detection and speech recognition were impressive engineering, but the skills (weather, music, smart home control, timers) were what people actually used every day. The platform was only as good as the skills running on it.
What Anthropic Got Right
Anthropic’s decision to open-source the Agent Skills spec follows the same playbook that made MCP an industry standard. Release an open specification. Get adoption. Let the ecosystem build on it.
The deeper insight is that Anthropic recognized that model capability alone isn’t enough. Models need structured, domain-specific instructions to perform reliably in production. And those instructions should be portable, so a skill that works with Claude can work with any model that supports the spec.
This validates an approach we’ve been building toward at Vestmark. We’ve been developing skills that encode domain expertise: how to analyze data, how to generate compliant outputs, how to handle specific operational scenarios. These skills represent institutional knowledge packaged in a format that AI systems can use.
The skills are the moat. The model is a commodity.
The Grok Incident
Now consider the Pentagon and Grok. Whatever you think about the politics, the technical reality is stark: deploying a consumer-grade chatbot into a military context without a robust skill layer is reckless.
Grok is a capable model. It can generate text, answer questions, hold a conversation. But it has no training in military doctrine, no understanding of rules of engagement, no domain-specific constraints that prevent it from saying something inappropriate in a command-and-control context.
This is what happens when you treat the model as the product. You get a system that’s technically capable but contextually dangerous. It can generate fluent text about anything, which means it can generate fluent text about things it absolutely should not.
The fix isn’t a different model. It’s a skill layer that encodes domain knowledge, constraints, and guardrails appropriate to the context.
Building the Skill Layer
If you accept that skills are the unit of AI value, the question becomes: how do you build good ones?
Start with the failure modes. Before you write a single instruction, catalog everything that could go wrong. What should the agent never say? What actions should it never take? What information should it never reveal? Define the negative space first.
Encode domain expertise. The most valuable skills contain knowledge that’s hard to acquire: regulatory requirements, institutional processes, domain-specific terminology and conventions. This is the knowledge that lives in the heads of your senior people. Capturing it in a skill makes it available to every AI system in your organization.
Version and test. Skills should be versioned like code and tested like code. When you update a skill, run it against a regression suite. When you deploy a new model, run your skills against it to verify they still work. The skill layer is software and should be treated with the same rigor.
Share and iterate. Anthropic’s open spec enables skill sharing across organizations and across models. This means communities of practice could develop shared skills for common tasks, while individual organizations develop proprietary skills that encode their competitive advantage.
The Takeaway
Stop agonizing about models. Start building skills.
The model you use will change every few months as the frontier advances. The skills you build will compound over time, encoding more domain knowledge, handling more edge cases, producing more reliable results. Your skills are your competitive advantage. Your model is interchangeable.
Anthropic understood this. The Pentagon didn’t. Choose which side of that lesson you want to be on.