Claude Code Insights

3,166 messages across 149 sessions (188 total) | 2026-03-16 to 2026-05-02

At a Glance

What's working: You've built a sophisticated multi-stage workflow around MR quality — running parallel multi-model reviews, verifying critical findings before posting, and feeding learnings back into reusable skills. The phased approach on MR !111 (resolving 7/7 Critical and 11/12 Important items in discrete, verifiable commits) and your deep research and extensive hands-on use of third-party tools show real production discipline. You also treat skills as living infrastructure, hardening them against real MRs rather than one-off use. Impressive Things You Did →

What's hindering you: On Claude's side: it tends to act before observing — starting code edits during review-only sessions, making claims about IDE buffer state without checking disk, and producing prose-heavy first drafts when you clearly wanted structured tables or critical analysis. On your side: environment assumptions (English locales, @me identity, default fetches) keep biting until they're documented, and ambiguous depth/format expectations lead to rework cycles where you push back and Claude redoes the work. Where Things Go Wrong →

Quick wins to try: Codify your dual-review → verify → post pipeline as a single skill with explicit anti-hallucination guardrails (verify-files-exist gates, observe-before-claim steps). Add a pre-commit or pre-edit hook that blocks code modifications during review-mode sessions so Claude can't pre-act. And capture the JetBrains reformat_file + replace_text_in_file buffer-flush trick as a skill — that empirical discovery shouldn't have to be rediscovered. Features to Try →

Ambitious workflows: As models improve, expect your skill stack to compose into an autonomous spec-to-merge pipeline: hand it an OpenAPI diff or Figma link, and it spins up a worktree, implements, runs three-model review, applies Critical/Important fixes, verifies via headless browser, and posts the aggregated comment — only paging you for true judgment calls. Pair this with a self-healing skills system where every correction you make (placeholder wording, locale, format, premature edits) becomes a structured friction event that proposes AGENTS.md or skill amendments automatically, so the misunderstandings that caused this period's rework cycles don't recur. On the Horizon →

3,166

Messages

+33,751/-4,072

Lines

622

Files

Days

105.5

Msgs/Day

What You Work On

Merge Request Review & Quality Workflows ~18 sessions

Extensive use of Claude Code for multi-model code review workflows (dual-review, parallel review agents) on MRs across staging branches. Claude orchestrated Codex/Gemini/Claude reviews, aggregated findings into structured comments, posted to GitLab, verified critical findings, and tracked follow-up issues. Friction included occasional hallucinated blockers and tool invocation failures requiring pivots to manual agent reviews.

Frontend Feature Development & Bug Fixes ~12 sessions

TypeScript-heavy work on organization management tables, contracts pages, album review tabs, and Taiwan ID validation, often shipped end-to-end via MRs. Claude diagnosed root causes (API type changes, nullable arrays, CDN race conditions), implemented multi-file changes, verified via Chrome DevTools, and opened MRs with proper assignees/reviewers.

OpenAPI Spec Sync & Tooling ~7 sessions

Recurring workflow to sync openapi.yaml from staging via authenticated Chrome tabs, reformat through JetBrains MCP, and commit changes. Claude iteratively debugged IDE buffer flushing tricks, CORS issues, and regex bugs in changelog tooling, eventually codifying the process into a reusable daily-run skill.

Reverse Engineering & Architecture Analysis ~6 sessions

Deep research and extensive hands-on use of third-party apps, paired with comparative evaluation of AI agent repos (multica, hermes-agent, openclaw, open-webui). Claude systematically inspected binaries, network traffic, models, and prompts to produce comprehensive architecture documentation and positioning analyses to inform tooling decisions.

Learning, Explanations & Dev Environment ~7 sessions

Q&A sessions on RAG/chatbot architecture, Ghostty terminal quirks, Neovim adoption tailored to a WebStorm+vim workflow, and conceptual code walk-throughs. Claude produced structured explanations, week-one practice plans saved to notes, and targeted configuration guidance based on version-specific bug research.

What You Wanted

Code Review

Git Commit

Commit And Push

Learning Explanation

Sync Openapi Spec

Create Issues

Top Tools Used

Bash

4447

Read

1531

Edit

1362

TaskUpdate

315

Write

308

Agent

288

Languages

TypeScript

1458

Markdown

938

JSON

265

Shell

107

YAML

Session Types

Multi Task

Single Task

Exploration

Iterative Refinement

Quick Question

How You Use Claude Code

You operate Claude Code as a high-throughput engineering manager running parallel workstreams rather than a hands-on coder. Across 149 sessions and 191 commits, your dominant pattern is invoking structured workflows—`/dual-review`, `/sync-openapi-spec`, `/gitlab-push-mr`—and orchestrating multi-phase work (review → fix → simplify → push → journal) in a single session. You frequently spawn parallel review agents across 3 models, batch-review MRs 124-131 simultaneously, and chain tasks like 'resolve conflicts, archive specs, open MR, update memory.' The 4447 Bash calls vs 1362 Edits confirms you're directing automation, not micro-editing. You also invest heavily in meta-tooling: when a skill misbehaves (designer-dev assignee bug, gitlab-mr defaults, openapi reformat buffer issue), you don't just patch the symptom—you iterate across multiple real MRs until root-caused, then distill the learning into journal/guide/skill files.

You interrupt and redirect aggressively when Claude drifts. The friction log shows you cutting in when Claude edits code during a review-only task ('requiring revert'), when sub-agents hallucinate blockers, when output is too prose-heavy ('這... 不是好看的表格啊～？'), or when surface-level analysis endorses someone else's framing instead of critiquing it. You corrected 'three places' to 'five places' on an alias rename, aborted a hung Codex review, and pushed back when Claude wrote specs in English against your memory rules. This suggests you read outputs critically and verify against ground truth, not skim-and-approve.

Your specs tend to be goal-oriented and trust-the-process: 'ship MR 111 through full quality workflow,' 'research this third-party tool deeply and document it,' 'triage these 3 tasks.' You let Claude run long multi-stage workflows but expect verification at each phase (browser automation checks, tsc/biome green, posted GitLab comments). The 18 dissatisfied vs 227 satisfied/likely-satisfied messages and 37/52 fully-achieved sessions show the model works—but the recurring 'wrong_approach' (28) and 'misunderstood_request' (15) friction points come from your high standards on workflow granularity, language (Chinese vs English), and scope boundaries that aren't always stated upfront.

Key pattern: You run Claude as a parallel workflow orchestrator with high-trust delegation but sharp, critical course-correction whenever outputs drift from your standards or ground truth.

User Response Time Distribution

2-10s

131

10-30s

497

30s-1m

519

1-2m

497

2-5m

398

5-15m

211

>15m

150

Median: 64.4s • Average: 218.7s

Multi-Clauding (Parallel Sessions)

Overlap Events

Sessions Involved

Of Messages

You run multiple Claude Code sessions simultaneously. Multi-clauding is detected when sessions overlap in time, suggesting parallel workflows.

User Messages by Time of Day

Morning (6-12)

1081

Afternoon (12-18)

1301

Evening (18-24)

370

Night (0-6)

414

Tool Errors Encountered

Other

162

Command Failed

157

User Rejected

File Changed

File Not Found

File Too Large

Impressive Things You Did

Across 149 sessions of intensive frontend development, code review, and tooling work, you've built a remarkably disciplined workflow around shipping MRs with confidence.

Multi-Model Parallel Code Review

You've operationalized a /dual-review workflow that runs three models in parallel against your MRs, then aggregates findings into a single posted comment with severity rankings. You consistently verify Critical findings before posting, audit adoption across multiple MRs, and distill recurring patterns back into journals, guides, and skills — turning each review into compounding institutional knowledge.

Worktree-Isolated MR Lifecycle

You routinely set up isolated worktrees for MR review, run staged phases (conflict resolution, simplify pass, multi-model review, openspec backfill, reviewer feedback), and ship with verification at each step. Your phased approach on MR !111 — resolving 7/7 Critical and 11/12 Important items across discrete commits — shows a mature production discipline most teams never achieve.

Skills as Reusable Infrastructure

Rather than treating each task as one-off, you iteratively harden workflows into skills — sync-openapi-spec, gitlab-push-mr, designer-dev — testing them against real MRs and fixing root causes (like the bot-account assignee bug or the IDE buffer flush trick). You then squash commits cleanly and document the empirical findings, building a personal toolkit that gets sharper with every session.

What Helped Most (Claude's Capabilities)

Multi-file Changes

Good Debugging

Good Explanations

Correct Code Edits

Fast/Accurate Search

Proactive Help

Outcomes

Partially Achieved

Mostly Achieved

Fully Achieved

Where Things Go Wrong

Your sessions show high success rates but recurring friction around Claude taking premature action, misjudging scope, and stumbling on environmental quirks before observing actual state.

Premature action before verification

You frequently have to interrupt Claude when it starts editing code, creating tasks, or making claims before reading the relevant state. Asking Claude to explicitly observe-then-act (or telling it 'review only' upfront) would prevent these reverts.

During the MR 120 worktree setup, Claude edited code when you only wanted a review, requiring a revert and explicit clarification.
On the broken staging site, Claude filtered the console and started creating tasks before reading full console output, and incorrectly predicted a 24h CloudFront cache delay that re-running CI immediately disproved.

Surface-level analysis requiring pushback

Claude often delivers prose-heavy or face-value first drafts when you wanted structured tables or critical analysis, forcing you to push back with corrections. Stating the desired format and depth of skepticism upfront would reduce these rework cycles.

Claude endorsed Kyle's spec-check framing at face value until you demanded deeper critical analysis, then interrupted a tool call mid-stream.
The initial mr-comment.md came back prose-heavy, prompting your '這... 不是好看的表格啊～？' pushback before Claude reformatted it as a table.

Environment and tooling assumptions

Many sessions hit avoidable friction from Claude assuming default tool behavior (English locales, @me identity, simple fetches) without checking the actual environment first. Documenting these gotchas in skills/AGENTS.md (which you've started doing) and asking Claude to probe the environment before scripting would help.

The stale-branch cleanup script used English '[gone]' markers but your git output was localized Chinese '[遺失]', requiring mid-script adjustment.
OpenAPI sync repeatedly failed first attempts due to CORS, missing JS context on raw YAML tabs, and IDE buffer-vs-disk confusion before converging on the working approach.

Primary Friction Types

Wrong Approach

Misunderstood Request

User Rejected Action

Buggy Code

Excessive Changes

Missed Observation

Inferred Satisfaction (model-estimated)

Frustrated

Dissatisfied

Likely Satisfied

171

Satisfied

Happy

Existing CC Features to Try

Suggested CLAUDE.md Additions

Just copy this into Claude Code to add it to your CLAUDE.md.

## Review-Only Mode
When user asks for a 'review', 'comparison', 'analysis', or 'verify' — DO NOT edit code. Only produce notes, tables, or comments. Wait for explicit instruction before applying any fixes.

Multiple sessions show Claude prematurely editing code during review tasks (MR 120, MR 109 verification), forcing the user to revert changes.

## Language
Respond in Traditional Chinese (繁體中文) by default unless the user writes in English. This includes spec docs, MR descriptions, and review comments.

Multiple sessions show Claude defaulting to English when user expected Chinese (Codex review, spec writing), requiring corrections.

## Verification Before Claims
- Always observe disk state (git status, ls, cat) after IDE/MCP file operations before claiming success — IDE buffers ≠ disk.
- When counting affected files/places, run a grep across the codebase first; never estimate.
- After running glab/gh commands, do not truncate lists — paginate or filter explicitly.

Recurring friction: 'three places' vs actual five, reformat_file IDE-buffer confusion, truncated commit lists from glab — user repeatedly had to correct unverified claims.

## Git Localization
Git output on this machine is localized in Traditional Chinese. Use '[遺失]' instead of '[gone]' when parsing branch tracking status, and account for translated status keywords in scripts.

Stale-branch cleanup script broke because it assumed English '[gone]' marker; this is a stable environment fact worth recording.

Just copy this into Claude Code and it'll set it up for you.

Custom Skills

Reusable slash-command workflows defined as markdown files.

Why for you: You already use /sync-openapi-spec, /dual-review, and /gitlab-push-mr heavily (165 Skill invocations). Codifying your recurring 'review MR → verify findings → post aggregated comment → update journal' pipeline as /review-mr would eliminate the repeated multi-step orchestration visible in 14+ code-review sessions.

mkdir -p .claude/skills/review-mr && cat > .claude/skills/review-mr/SKILL.md <<'EOF'
# Review MR
1. Set up worktree for the MR branch
2. Run three parallel review agents (Claude, Codex, Gemini) against staging
3. Verify each Critical/Important finding against actual code
4. Aggregate into severity-ranked table (繁體中文)
5. DO NOT edit code — output review only
6. Post merged comment to GitLab via glab
EOF

Hooks

Auto-run shell commands at lifecycle events (pre-commit, post-edit, etc).

Why for you: You hit pre-commit hook failures requiring --no-verify workarounds, and Claude sometimes commits without running tsc/biome. A PostToolUse hook on Edit/Write that runs `tsc --noEmit` and `biome check` would catch the CSS regression and type errors you saw in MR #111 before they reach commit.

// .claude/settings.json
{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{"type": "command", "command": "npx tsc --noEmit 2>&1 | head -20"}]
    }]
  }
}

Task Agents

Spawn focused sub-agents for parallel exploration.

Why for you: Your dual/triple-review pattern is essentially this — but you also had a sub-agent hallucinate non-existent files in MR 124. Explicitly framing review agents with 'verify file exists via Read before citing' in the agent prompt would prevent those hallucinated blockers, and you could parallelize the multi-file tsc/biome verification you do manually.

Ask: "Use 3 parallel agents to review src/contracts, src/organizations, and src/shared against staging. Each agent must Read every file before citing it as a finding. Output severity-ranked table."

New Ways to Use Claude Code

Just copy this into Claude Code and it'll walk you through it.

Codify the dual-review → verify → post pipeline

Your most common high-value workflow is multi-model MR review with verification and aggregated posting. Make it a single skill with explicit anti-hallucination guardrails.

Across 14 code_review sessions you repeat: spawn reviewers → verify Critical findings against real code → aggregate into Chinese severity table → post via glab → update journal. Variability in invocation (Codex vs agents, English vs Chinese, premature edits) caused friction in 5+ sessions. A locked-down /review-mr skill that enforces 'review-only, verify-before-cite, 繁體中文 output' would remove that drift.

Paste into Claude Code:

Create a /review-mr skill at .claude/skills/review-mr/SKILL.md that: (1) sets up an isolated worktree, (2) runs 3 parallel review agents with instructions to Read every file before citing it, (3) aggregates findings into a 繁體中文 severity-ranked markdown table, (4) NEVER edits code, (5) posts the table as an MR comment via glab and updates notes/journal.md. Reference the existing /dual-review skill for orchestration patterns.

Stop pre-acting on review/verify requests

When you ask for review or verification, Claude sometimes starts fixing things — costing reverts. Add an explicit gate.

MR 120, MR 109 verification, and the spec-check session all show Claude jumping to edits during review-mode requests. Adding a CLAUDE.md rule plus prefacing review prompts with 'OUTPUT ONLY — no edits' would save the revert cycle. You can also use the same pattern when invoking dual-review.

Paste into Claude Code:

Review MR <N> against staging. OUTPUT-ONLY MODE: produce a markdown findings table in 繁體中文. Do not Edit, Write, or run any commands that modify files. After I read your findings, I will tell you which to fix.

Document the IDE-buffer flush trick as a skill

You spent a long session empirically discovering reformat_file + paired replace_text_in_file flushes the JetBrains IDE buffer to disk. Capture it.

The openapi.yaml sync workflow has bitten you repeatedly with IDE buffer vs disk confusion (Claude claiming success when nothing was written, missing manual save warnings). Encode the discovered procedure as a skill so future syncs and any other JetBrains MCP file edits never regress.

Paste into Claude Code:

Update .claude/skills/sync-openapi-spec/SKILL.md to include: 'After reformat_file via JetBrains MCP, immediately call replace_text_in_file with a no-op change to flush the IDE buffer to disk. Then verify with `cat <file> | head -20` from Bash before claiming success. Never trust the IDE buffer state.'

On the Horizon

Your workflow already orchestrates parallel review agents, multi-MR triage, and skill iteration—the next leap is fully autonomous pipelines that ship, verify, and learn without you watching.

Autonomous MR Pipeline From Spec to Merge

Given an OpenAPI diff or Figma link, a single command spins up a worktree, generates the implementation, runs dual-review with three models in parallel, applies Critical/Important fixes itself, verifies via headless browser, and posts the merged review comment—all before you read the notification. Your existing dual-review, sync-openapi-spec, and gitlab-mr skills become composable stages in one supervised pipeline that only pages you for true human-judgment calls.

Getting started: Wrap your current skills in a top-level orchestrator using Claude Agent SDK subagents with explicit phase gates (implement → review → fix → verify → comment), and use hooks to enforce 'observe disk state' and 'verify tsc baseline' before each transition.

Paste into Claude Code:

Build a /ship-feature slash command that takes either a Figma URL or an OpenAPI spec diff and runs end-to-end: (1) create isolated worktree, (2) implement changes with TypeScript + tests passing, (3) launch three parallel sub-agents running Claude/Codex/Gemini reviews against the staging branch, (4) auto-apply all Critical and Important fixes with verification commits, (5) run Chrome DevTools MCP to verify the change in browser, (6) open MR with correct assignee/reviewer defaults from the gitlab-mr skill, (7) post merged review comment as a markdown table (no horizontal rules—they break GitLab). Add explicit checkpoints where it must stop and ask me: ambiguous product decisions, destructive git operations, and any case where staging tsc baseline isn't green. Log every phase to a journal entry I can audit.

Self-Healing Skills That Learn From Friction

Every time you correct Claude (placeholder text wording, '5 places not 3', Chinese vs English, prose vs tables, premature code edits during review), that correction becomes a structured friction event that automatically proposes a skill or AGENTS.md amendment. Over weeks, your 165 Skill invocations compound into a system that pre-empts the misunderstandings that caused 28 'wrong_approach' and 15 'misunderstood_request' incidents this period.

Getting started: Add a SessionEnd hook that scans the transcript for user interruptions and corrections, then runs a meta-agent that proposes diffs to the relevant skill file or AGENTS.md, opening them as a draft MR for your weekly review.

Paste into Claude Code:

Create a 'friction-harvester' subagent that runs at the end of every session. It should: (1) parse the transcript for user interruptions, rejections, and corrections, (2) classify each into categories matching my friction taxonomy (wrong_approach, misunderstood_request, premature_action, language_mismatch, formatting_mistake), (3) trace each back to the skill or AGENTS.md rule that should have prevented it, (4) draft a concrete diff to that file with before/after examples, (5) collect all diffs into a weekly 'skill-evolution' MR I can review Sunday evening. Specifically watch for: review-only requests where Claude edited code, output-language mismatches, table-vs-prose formatting preferences, and assumptions made without observing disk state first.

Test-Driven Reverse Engineering Loop

Point Claude at a binary (like the third-party apps you researched) or a staging API, and it autonomously generates a hypothesis-test-refine loop: write characterization tests that capture current behavior, then iterate implementation against those tests in parallel branches until green. The 6 reverse_engineering sessions and the contracts page Unix-seconds debug become a 30-minute autonomous job instead of a multi-hour exploration.

Getting started: Combine a long-running planning agent with parallel implementation subagents that each pursue a different hypothesis branch, all racing against the same generated test suite using git worktrees and Chrome DevTools MCP for live validation.

Paste into Claude Code:

I want to reverse-engineer or debug a system using a test-driven autonomous loop. Given a target (binary, staging URL, or broken page), do this: (1) generate a characterization test suite that captures current observable behavior—API shapes, DOM states, network calls via Chrome DevTools MCP, (2) form 3-5 distinct hypotheses about what changed or how it works, (3) spawn parallel sub-agents in separate worktrees, each implementing one hypothesis, (4) run all hypotheses against the test suite concurrently, (5) report back which branches pass with a comparison table, (6) for the winning hypothesis, expand tests to lock in behavior and open an MR. Apply this first to: detecting API contract drift between our frontend and the OpenAPI spec, by generating tests from the spec and running them against staging to surface mismatches like the date-string-to-Unix-seconds break.

"User pushes back on a prose-heavy MR comment with a perfectly-judged passive-aggressive Mandarin: '這... 不是好看的表格啊～？'"

During the MR !111 quality workflow, Claude produced a prose-heavy mr-comment.md instead of the table format the user wanted. The user's gentle-but-cutting reply ('This... isn't a nice-looking table~?') prompted Claude to redo it as a proper table.