JIRA based spec driven development using Claude

The idea, up front

I like Kiro's spec-driven model: three files (requirements.md, design.md, tasks.md), an agent that walks them, work that decomposes into checkbox tasks. But I don't want the spec living in the repo. Every spec-driven tool I've tried — Kiro, GitHub Spec Kit, cc-sdd — assumes the spec is a tracked artifact in git, and that's been my hangup. Requirements belong in a ticket tracker where product people already work and where edits have an audit trail.

So the experiment is: Jira is the source of truth. Claude Code reads a local mirror generated on demand. The only thing checked into the repo is one pointer file (CLAUDE.md) telling future sessions where to look.

That's it. The rest of this post is the details — directory layout, sync mechanism, a Story-sizing lesson, what I've actually built. I'm still experimenting; some of it might be wrong.

Why this and not just specs-in-repo

I tried specs-in-repo first, the obvious way. It worked for a few days, then:

Editing a requirement during a planning chat became a git commit. The next day I couldn't tell which commit was "we changed scope" vs "we fixed a typo in the spec."
A pull request with one acceptance criterion change had two unrelated edits in it — prose and code — and reviewing it required two different mental modes.
The PM I work with opened the spec once. The actual product conversation went back to Jira. The repo's copy drifted from the canonical one within a week.

The issue isn't Kiro's model. It's where the spec lives. The repo is the wrong place because git history is the wrong audit trail for product decisions, and engineers shouldn't be reviewing prose during code review.

I want Kiro's discipline, Claude's interface, Jira's audit trail. That's the glue I'm building.

How it fits together

Three actors, one rule: spec lives in Jira, agent reads a local mirror, repo carries only code plus the pointer.

Jira side

An Epic = one spec = one feature. Its description carries the overview, baseline constraints, scope, out-of-scope list, and a Spec status field (draft → approved → shipped).

Child Stories under the Epic are the requirements. Each Story's description has three sections:

## User story
As a <persona>, I want <capability>, so that <goal>.

## Acceptance criteria (EARS)
- The system SHALL ...
- WHEN <event>, THEN the system SHALL ...
- IF <bad condition>, THEN the system SHALL ...

## Verification
- <concrete shell command or test invocation>

EARS (Easy Approach to Requirements Syntax) keeps the criteria specific without going full formal methods. Subtasks under each Story are the discrete acceptance tests — one Subtask = one row in the future tasks.md checklist.

Local mirror

When I want to work on a spec, the workflow pulls it down to ~/.claude-specs/<repo>/<EPIC-KEY>/:

~/.claude-specs/specsync/KAN-4/
├── requirements.md     # generated from Stories; READ-ONLY
├── tasks.md            # generated from Subtasks; READ-ONLY
├── design.md           # I (Claude) write this; the only writable artifact
├── .meta.json          # sync baseline + drift detection
└── .gitignore-hint     # doc explaining this dir is outside the repo

This directory is outside any git repo. The repo's own .gitignore also blocks requirements.md, tasks.md, design.md, and .claude-specs/ at the repo root as a second layer. Two defenses, because the whole point is that this content never accidentally lands in a commit.

requirements.md is generated from Stories, EARS criteria preserved verbatim. If you want to change a requirement, you edit the Story in Jira and re-pull. tasks.md mirrors Subtask state (Done in Jira → [x] locally). design.md is the exception — Claude writes it, including a ## Invariants section for property-based tests.

Repo side

specsync/
├── CLAUDE.md       # the only spec-aware file. Points at KAN-4.
├── .mcp.json       # registers Atlassian Rovo MCP. No secrets in the file.
├── .gitignore      # blocks spec filenames defensively
└── src/, build.gradle, ...

CLAUDE.md is the bridge for cold sessions. Someone opens this repo in Claude Code six months from now, reads CLAUDE.md, and learns: requirements live in Jira Epic KAN-4, run /spec-pull KAN-4 to materialize them, verify with ./gradlew test and curl http://localhost:8080/hello. That's the entire onboarding.

The skill

The glue is a Claude Code skill at ~/.claude/skills/spec-kit-jira/. It exposes:

Command	What it does
`/spec-pull <EPIC-KEY>`	Materialize the local mirror from Jira. Warn if local `design.md` is newer than the last pull.
`/spec-design`	Author or update `design.md`, including invariants.
`/spec-impl`	Read invariants, generate property-based tests, run RED → GREEN per Subtask.
`/spec-push`	Post `design.md` to the Epic as a versioned attachment.
`/spec-status`	Show local vs Jira diff.

I've hand-built the bootstrap so far. The skill itself is next.

Sync mechanism

.meta.json records the Jira updated timestamp of every ticket at the moment of pull:

{
  "epic": "KAN-4",
  "last_pull_at": "2026-05-16T23:05:00+02:00",
  "epic_updated": "2026-05-16T23:02:30.334+0200",
  "children": {
    "KAN-5":  { "updated": "...", "status": "To Do", "summary": "Repo baseline + bootJar" },
    "KAN-6":  { "updated": "...", "status": "To Do", "summary": "CLAUDE.md bootstrap" },
    "KAN-7":  { "updated": "...", "status": "To Do", "summary": "/hello endpoint" }
  }
}

/spec-pull re-fetches via JQL, diffs each ticket's current updated against the recorded baseline, re-renders what changed, and warns about drift. If design.md was edited after the last pull, the command prompts before overwriting.

The reverse path is deliberately tiny. Only design.md goes back to Jira, only via /spec-push, only as a new attachment versioned by timestamp. Old versions stay in attachment history — that's the version log. One writable artifact, list-append channel. The hard cases in sync systems come from "both sides edited the same thing"; I avoid them by making sure only one side ever edits each thing.

Property-based tests from invariants

The design.md template requires:

## Invariants
- INV-1: For any input list `xs`, `sort(xs)` is a permutation of `xs`.
- INV-2: `decode(encode(x)) == x` for all `x: Payload`.
- INV-3: Account balance never goes negative after any sequence of valid transactions.

/spec-impl reads these and emits property tests in the project's framework, detected by lockfile (pyproject.toml → Hypothesis, package.json → fast-check, Cargo.toml → proptest, go.mod → gopter).

EARS criteria live in requirements.md because they're prose and need to be human-readable. Invariants live in design.md because they need to be parseable. Splitting them means I get both: human-reviewable acceptance criteria and a clean parse target for test generation. I haven't run this loop end-to-end yet, so take it as designed, not proven.

What I've actually built

To prove the pattern wasn't fantasy I picked a deliberately small first spec: a Hello World REST API in Spring Boot 4 on Java 25 with Gradle (Groovy DSL) and a version catalog.

Created the Jira Epic (KAN-4) with full overview and baseline constraints.
Created Stories with ## User story, ## Acceptance criteria (EARS), ## Verification sections.
Realised the Stories were over-decomposed, restructured (more on that below).
Bootstrapped ~/.claude-specs/specsync/KAN-4/ by hand, including .meta.json with every ticket's updated timestamp.
Initialized the local git repo; hit a Bitbucket workspace-deletion snag that took an hour to diagnose. (Lesson: when something says "user limit exceeded" but you have one user, it's billing state, not user count.)

How Rovo did the bootstrap

Worth calling out: I didn't draw the Epic on paper and then transcribe it into Jira. Claude created every ticket directly, by calling the official Atlassian Rovo MCP server. Rovo is Atlassian's GA model-context-protocol bridge to Jira and Confluence — it exposes tools like createJiraIssue, editJiraIssue, searchJiraIssuesUsingJql, getTransitionsForJiraIssue, addCommentToJiraIssue. With those, Claude could:

list accessible Atlassian sites and pick the right cloudId,
find the Jira project (KAN) and discover its issue types,
create the Epic, then the six Stories (with the EARS-formatted descriptions) as separate API calls,
fetch all of them back with one JQL query (parent = KAN-4 ORDER BY key ASC) to capture each ticket's updated timestamp for .meta.json,
edit descriptions and titles when the restructure happened.

Auth is an API token in an environment variable; Rovo handles the rest. No app passwords, no scraping, no UI automation.

One trap worth noting for anyone replicating this: on team-managed (next-gen) Jira projects, Stories link to their Epic via the parent field, not the legacy "Epic Link" custom field. My first createJiraIssue call failed with Field 'customfield_10014' cannot be set; switching to {"parent": {"key": "KAN-4"}} worked. The MCP surfaces the error verbatim from Jira, so it was easy to diagnose — but it's the kind of detail that would have taken half an hour to figure out from docs alone.

This is part of why the experiment feels viable: Rovo means the Jira side of the workflow is real today, with no glue I had to write. The skill I haven't built yet is mostly orchestration around Rovo calls, not new transport.

Not built yet: the skill itself, the slash commands, framework detection for /spec-impl, the design.md attachment push, a pre-commit hook that catches spec content sneaking into the repo. I'm deliberately hand-driving every step the skill will eventually automate — I want to know the right shape before I encode it.

What surprised me

Story sizing is the first thing you get wrong

My first pass had separate Stories for "greet anonymous caller," "greet named caller," "reject invalid name." Three Stories. Looked tidy. Looked wrong the moment I imagined the pull request — no one would split that into three PRs. It's one controller.

The rule I now use: Epic = a feature. Story = one pull request. Subtask = one acceptance test inside that PR. Stories map to the unit of code review, not the unit of behavior. Once I made that swap, the /hello endpoint became one Story with six Subtasks. One PR. Six checkboxes.

If you take one thing from this post: when drafting Stories, ask "is this one PR or several?" If one, it's one Story.

From a process angle, the AI is invisible

Every artifact lives somewhere a human reviewer would expect — Epic descriptions, Story acceptance criteria, Subtask checklists, attachment history. Nothing screams "a machine wrote this." A new teammate who never uses Claude Code can pick up the Epic and ship the next requirement without learning the agent integration exists. That feels like the right test for any AI-augmented workflow: if you stripped out the AI, would the artifact still make sense to a human?

Open ends

Being honest about what I haven't worked out:

Subtask granularity. Six Subtasks per Story means a lot of Jira churn during bootstrap. The skill will mitigate that, but I'm not sure Subtask-per-test is always right; sometimes a checklist inside one task might be better.
Offline Jira. /spec-pull is the only refresh path. If Atlassian is down I can keep coding but can't reconcile. Fine for a single dev; less fine for a team.
Multiple specs in flight. Paths don't collide because they include the Epic key, but switching contexts mid-session needs an /spec-active KAN-N command I haven't designed.
EARS for non-engineers. Writing acceptance criteria in WHEN/THEN/IF form is a skill. Most PMs default to "as a user I want X" and stop. Teachable, but real.
The PBT loop is unproven. I believe ## Invariants → property tests works. I haven't done it end-to-end. Will know within a week or two.

Why share now

The shape feels right and I'd rather get pushback before I keep building. It should generalize past Claude — anything that can read markdown and call an MCP server should fit, which keeps the agent portable. If you see a place the design quietly assumes Claude specifically, tell me.

Still experimenting. Will write again when the skill is real and I've shipped a real feature through it.

Learning by Failing

Search This Blog