WUJS

I'm here

16 Oct 2025

spec kit / spec driven dev

TOC

Okay, right now I want to yap about spec kit.

“Vibe Coding” has gone viral, and it’s a software development practice where a user describes something, and an AI generates functional code based on that text chat. The idea is that instead of manually writing code, the user focuses on describing desired outcomes, using the AI to refine the code through iterative feedback.

While Vibe Coding is excellent for rapid prototyping and exploration, its unstructured nature poses a risk for production-grade systems. For that reason, it’s an approach I don’t support for building reliable, maintainable software. It is based on natural language, which can be vague. A prompt like “create an app” can easily deviate from what you desire. This challenge highlights a broader industry trend: a move away from unstructured prompting toward disciplined, structured engineering. A prime example of this evolution is spec kit. The spec kit project, supported by GitHub, is built on the idea of Spec-Driven Development (SDD). It flips traditional software development; the specification becomes the executable “source of truth.”

We had known Test-Driven Development (TDD), which provided a method for ensuring functional correctness at the micro level—the “how.” Now, we have Spec-Driven Development (SDD), which operates at a higher level of abstraction. The “what” and the “why” are required, defining the system’s intent and architecture before a single test is even written.

1

To make this concrete, let’s walk through the entire spec kit workflow. The following diagram visualizes the journey from a high-level idea to executable code, emphasizing the structured, stage-gated process with a human-in-the-loop at every critical step.

graph TD
    A[Start] --> B(Define Project Constitution)
    B --> C{Human Input: /speckit.specify}
    C --> D[AI Generates spec.md]
    D --> E(Human Reviews & Refines Spec)

    E --> F{Need to Clarify?}
    F -- Yes --> G{/speckit.clarify}
    G --> D
    
    F -- No --> H{Human Input: /speckit.plan}
    H --> I[AI Generates plan.md]
    I --> J(Human Reviews & Refines Plan)

    J --> K{Human Input: /speckit.tasks}
    K --> L[AI Generates tasks.md]
    L --> M(Human Reviews & Refines Tasks)

    M --> N{Need to Analyze?}
    N -- Yes --> O{/speckit.analyze}
    O --> L
    
    N -- No --> P{Human Input: /speckit.implement}
    
    P --> Q[AI Executes Tasks]
    
    subgraph TDD Cycle within Implement
        direction LR
        Q1[Write Failing Test] --> Q2[AI Writes Code to Pass]
        Q2 --> Q3[Run All Tests]
        Q3 -- More Tasks --> Q1
    end

    Q --> Q1
    
    Q3 -- All Tasks Done --> R[Final Code & Tests]
    R --> S[End]

As the diagram shows, spec kit replaces a single, vague prompt with a structured, multi-stage workflow. Each stage produces a specific artifact (like a spec.md or plan.md) and requires human review, providing the structured process that Vibe Coding lacks.

Now, let’s look at each of these components in detail:

  • constitution: This document defines the project’s non-negotiable rules, principles, and boundaries. It focuses on project-wide constraints like coding standards, tech stack choices, and architectural patterns that the AI must adhere to throughout the process.
  • specify: This command is used to define a feature’s intent. You describe what you want to build and why, focusing entirely on user stories, requirements, and acceptance criteria. The AI takes this and generates a detailed, structured specification document (spec.md).
  • plan: Once the “what” is defined, this step focuses on the “how”. It’s more technical. You provide the tech stack, architecture choices, and implementation details. The AI combines this with the constitution and specify documents to generate a detailed technical plan (plan.md). I don’t think the specify and plan phases should be entirely separate—maybe they are part of the same strategic thinking—but this structure provides a clear separation of concerns.
  • tasks: From the plan, we get a task list. We break down the technical plan into a concrete checklist of actionable items. The underlying AI agents will not skip any task; they will finish them one by one.
  • implement: This component attaches the agent to the tasks. When an agent finishes a task, it updates the task list. This process is driven by Test-Driven Development (TDD). For each task, the agent first writes a failing test, then writes the minimum code to make it pass, and finally refactors. This ensures that every piece of generated code is validated against a clear, executable requirement.

This whole thing is very interesting. To enhance the workflow, spec kit also provides three optional commands:

  • clarify: You run this command before the plan phase. Its purpose is to have the AI quiz you to clarify underspecified areas in your initial requirements. This improves the quality of the specification itself.
  • analyze: This is run after tasks are generated but before implement. It performs cross-artifact consistency analysis to find contradictions or gaps between the spec, the plan, and the tasks, improving the quality of the task list.
  • checklist: This generates a custom quality checklist to validate requirements, which helps ensure the specification is clear, consistent, and complete.

2

However, there are some problems for me with this approach right now.

  • Cost: We know that running agents automatically is interesting, but the cost is a major issue. The amount of tokens consumed is large. When you run a command, it eats a lot of prompts and generates a lot of output, much of which needs to be refined. Ran an agent for five hours to build a very simple app, and it cost up to 100+ bucks. The prompts and task lists can become very long.
  • Speed: Building an app still takes a long time. Agents have to generate code, run tests, fix bugs, and repeat this cycle, which takes time. (SLOW when compared with developers working with AI).
  • Reliability: The implement phase’s reliance on TDD seems like a strong safeguard, but this reveals a more subtle and critical problem: if the initial inputs are flawed, the output will be flawed as well(garbage in, garbage out). Where do the tests come from? In this AI-driven workflow, the tests themselves are generated based on the upstream plan and spec. If we, as developers, failed to rigorously review these initial specifications (looong! and we are lazy), the AI would simply generate tests that perfectly match a flawed plan. The TDD process will then reliably guide the AI to build a flawed feature that passes all of its flawed tests. The issue isn’t that TDD is unreliable; it’s that the content of the tests becomes unreliable, directly reflecting any ambiguity or error we allowed in the initial spec.

So, my view is this: while the spec-kit tooling itself may still be maturing in terms of cost and speed, its underlying principles are what is valuable. The main takeaway is not about a specific tool, but about embracing the importance of spec-driven development itself. A well-organized, version-controlled specification is critical for any project, regardless of who—or what—is writing the code. The day agents become cheap and fast is still some time away, but it’s coming. When the cost and speed problems are solved, ensuring the reliability of the initial spec will be the final challenge to solve.

There are some other SDD tools out there, like Kiro, Tessl. As real products sold to devs, they’re all more strict, having their own set of rules to mitigate the problems I mentioned above. I’m favoring spec kit because it’s more transparent and less opinionated, allowing devs to have more control over the process, and the most important thing as I mentioned, is to get into this idea of spec driven.