Summary
This guide documents procedures for testing Superpowers skills, with a primary focus on integration testing for complex skills that involve subagents, multi-step workflows, and cross-agent interactions. It covers test structure, execution steps, validation criteria, token usage analysis, troubleshooting, and best practices for authoring new integration tests.
Overview
Testing skills that involve subagents, workflows, and complex interactions requires running actual Claude Code sessions in headless mode and verifying their behavior through session transcripts.
Test Structure
Superpowers skill tests follow this directory structure:
tests/
├── claude-code/
│ ├── test-helpers.sh # Shared test utilities
│ ├── test-subagent-driven-development-integration.sh
│ ├── analyze-token-usage.py # Token analysis tool
│ └── run-skill-tests.sh # Test runner (if exists)
Running Tests
Integration Tests
Integration tests execute real Claude Code sessions with actual skills:
# Run the subagent-driven-development integration test
cd tests/claude-code
./test-subagent-driven-development-integration.shNote: Integration tests can take 10-30 minutes as they execute real implementation plans with multiple subagents.
Requirements
- Must run from the superpowers plugin directory (not from temp directories)
- Claude Code must be installed and available as
claudecommand - Local dev marketplace must be enabled:
"superpowers@superpowers-dev": truein~/.claude/settings.json
Integration Test: subagent-driven-development
What It Tests
The integration test verifies the subagent-driven-development skill correctly:
- Plan Loading: Reads the plan once at the beginning
- Full Task Text: Provides complete task descriptions to subagents (doesn’t make them read files)
- Self-Review: Ensures subagents perform self-review before reporting
- Review Order: Runs spec compliance review before code quality review
- Review Loops: Uses review loops when issues are found
- Independent Verification: Spec reviewer reads code independently, doesn’t trust implementer reports
How It Works
- Setup: Creates a temporary Node.js project with a minimal implementation plan
- Execution: Runs Claude Code in headless mode with the skill
- Verification: Parses the session transcript (
.jsonlfile) to verify:- Skill tool was invoked
- Subagents were dispatched (Task tool)
- TodoWrite was used for tracking
- Implementation files were created
- Tests pass
- Git commits show proper workflow
- Token Analysis: Shows token usage breakdown by subagent
Test Output
========================================
Integration Test: subagent-driven-development
========================================
Test project: /tmp/tmp.xyz123
=== Verification Tests ===
Test 1: Skill tool invoked...
[PASS] subagent-driven-development skill was invoked
Test 2: Subagents dispatched...
[PASS] 7 subagents dispatched
Test 3: Task tracking...
[PASS] TodoWrite used 5 time(s)
Test 6: Implementation verification...
[PASS] src/math.js created
[PASS] add function exists
[PASS] multiply function exists
[PASS] test/math.test.js created
[PASS] Tests pass
Test 7: Git commit history...
[PASS] Multiple commits created (3 total)
Test 8: No extra features added...
[PASS] No extra features added
=========================================
Token Usage Analysis
=========================================
Usage Breakdown:
----------------------------------------------------------------------------------------------------
Agent Description Msgs Input Output Cache Cost
----------------------------------------------------------------------------------------------------
main Main session (coordinator) 34 27 3,996 1,213,703 $ 4.09
3380c209 implementing Task 1: Create Add Function 1 2 787 24,989 $ 0.09
34b00fde implementing Task 2: Create Multiply Function 1 4 644 25,114 $ 0.09
3801a732 reviewing whether an implementation matches... 1 5 703 25,742 $ 0.09
4c142934 doing a final code review... 1 6 854 25,319 $ 0.09
5f017a42 a code reviewer. Review Task 2... 1 6 504 22,949 $ 0.08
a6b7fbe4 a code reviewer. Review Task 1... 1 6 515 22,534 $ 0.08
f15837c0 reviewing whether an implementation matches... 1 6 416 22,485 $ 0.07
----------------------------------------------------------------------------------------------------
TOTALS:
Total messages: 41
Input tokens: 62
Output tokens: 8,419
Cache creation tokens: 132,742
Cache read tokens: 1,382,835
Total input (incl cache): 1,515,639
Total tokens: 1,524,058
Estimated cost: $4.67
(at $3/$15 per M tokens for input/output)
========================================
Test Summary
========================================
STATUS: PASSED
Token Analysis Tool
Usage
Analyze token usage from any Claude Code session:
python3 tests/claude-code/analyze-token-usage.py ~/.claude/projects/<project-dir>/<session-id>.jsonlFinding Session Files
Session transcripts are stored in ~/.claude/projects/ with the working directory path encoded:
# Example for /Users/jesse/Documents/GitHub/superpowers/superpowers
SESSION_DIR="$HOME/.claude/projects/-Users-jesse-Documents-GitHub-superpowers-superpowers"
# Find recent sessions
ls -lt "$SESSION_DIR"/*.jsonl | head -5What It Shows
- Main session usage: Token usage by the coordinator (user or main Claude instance)
- Per-subagent breakdown: Each Task invocation with:
- Agent ID
- Description (extracted from prompt)
- Message count
- Input/output tokens
- Cache usage
- Estimated cost
- Totals: Overall token usage and cost estimate
Understanding the Output
- High cache reads: Good - means prompt caching is working
- High input tokens on main: Expected - coordinator has full context
- Similar costs per subagent: Expected - each gets similar task complexity
- Cost per task: Typical range is 0.15 per subagent depending on task
Troubleshooting
Skills Not Loading
Problem: Skill not found when running headless tests Solutions:
- Ensure you’re running FROM the superpowers directory:
cd /path/to/superpowers && tests/... - Check
~/.claude/settings.jsonhas"superpowers@superpowers-dev": trueinenabledPlugins - Verify skill exists in
skills/directory
Permission Errors
Problem: Claude blocked from writing files or accessing directories Solutions:
- Use
--permission-mode bypassPermissionsflag - Use
--add-dir /path/to/temp/dirto grant access to test directories - Check file permissions on test directories
Test Timeouts
Problem: Test takes too long and times out Solutions:
- Increase timeout:
timeout 1800 claude ...(30 minutes) - Check for infinite loops in skill logic
- Review subagent task complexity
Session File Not Found
Problem: Can’t find session transcript after test run Solutions:
- Check the correct project directory in
~/.claude/projects/ - Use
find ~/.claude/projects -name "*.jsonl" -mmin -60to find recent sessions - Verify test actually ran (check for errors in test output)
Writing New Integration Tests
Template
#!/usr/bin/env bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
source "$S