Here's a question: if you were told you were going to get the biggest, worst codebase that you had to work on, sight-unseen, but you could magically apply one refactoring to it, what would that refactoring be?
I think it's a good question, because:
Slop is the reason that you can ask an agent to build out a full 3d game, but it explodes into a fine mist if you ask it immediately after to change the character's shirt color. LLMs need good factoring / code organization just like humans do.
To that end I've been running an agentic refactor skill for Claude Code that I call /defrag. I run it after a large new feature or after a couple of smaller features to keep the code maintainable and to keep the agents effective. It also happens to contain my 14 answers to the original question.
---
name: defrag
description: Analyze code for refactoring opportunities and suggest the top 10 highest-value improvements. Use when the user says /defrag, "refactor", "clean up code", or asks for code improvements.
---
# Defrag - Code Refactoring Analysis
Analyze the specified file(s) or the user's current selection and identify the top 10 highest-value refactoring opportunities.
## Refactoring Types to Look For
1. **shorten_file** - Files >300 lines should be broken into multiple coherent files
2. **shorten_function** - Functions >50 lines should be broken into smaller functions with descriptive names
3. **reduce_nesting** - Replace nested conditionals with guard clauses and early returns
4. **extract_function** - Extract repeated or complex code into well-named functions
5. **rename_for_clarity** - Improve variable/function names that are unclear or too short
6. **simplify_conditionals** - Simplify if/else logic, use early return where possible
7. **extract_constants** - Move magic numbers/strings to local constants (NOT global constants files)
8. **consolidate_duplicates** - Merge duplicate code blocks
9. **modernize_syntax** - Use modern language features (destructuring, optional chaining, etc.)
10. **avoid_globality** - Move global items closer to where they're used; no utils/constants/managers files
11. **optimize_imports** - Clean up unused or poorly organized imports
12. **remove_dead_code** - Remove unused code. In typescript you can use knip for that.
13. **add_tests** - Identify code lacking test coverage
14. **break_up_hotspots** - look at the last 30 days of git history for the 5 files with the most changes. Suggest decomposition for any file that's changed more than 30 times.
## Scoring (Value Score 0-100)
Calculate each refactoring's value using these weights:
- **Readability improvement** (35%)
- **Maintainability improvement** (30%)
- **Bug risk reduction** (25%)
- **Performance impact** (5%)
- **Scope size** (5%)
## Output Format
For each refactoring opportunity, provide:
### {rank}. **{type}** (Value: {score}/100)
- **File:** `{filepath}:{start_line}-{end_line}`
- **Description:** {one-line description}
- **Rationale:** {why this helps}
**Before:**
```{language}
{current code snippet}
```
**After:**
```{language}
{refactored code snippet}
```
---
## Instructions
1. Read the file(s) the user specifies (or use their IDE selection)
2. Analyze for all refactoring types listed above
3. Score each opportunity using the weighted formula
4. Sort by value score (highest first)
5. Present the top 10 refactorings
6. End with: "Found {N} refactoring opportunities. Top 10 shown with average score: {avg}"
## Important Rules
- **Do NOT apply changes** - This is analysis only (plan mode)
- **Sort by value_score** - Highest value first
- **Show before/after code** - For each suggestion
- **Be specific** - Include exact file paths and line numbers
- **Follow YAGNI** - Don't suggest over-engineering or future-proofing
- **No global constants files** - Constants belong near their usage
After presenting, ask: "Would you like me to apply any of these? Say 'apply #1' or 'apply all'."
Here's the thinking behind some of these rules:
I got the ideal numbers by asking the LLM for ideal maximums. It probably doesn't know any better than I do, so feel free to fill in your own choices, but if you're coding agentically for long you'll hit whatever maximum you set eventually. Why this helps: These maximums are two primary drivers of abstraction. Just like a human, the LLM can look at a file name or function name and assume what it does from the name instead of having to read all the code in it. That means it has to read a lot less code, and it uses a lot less context.
I also asked the LLM "which is more expensive for LLMs: long files or long functions?" It told me that it can grep through long files but it has to read the entirety of a function, so long functions are more expensive for it. I don't actually totally believe the LLMs with these answers. I'll need to run some evals on all of this at some point, but I think it's a good starting point.
I feel like reducing code has got to be one of the easiest ways for a codebase to require less context. Larger codebases are usually harder for an llm to manage than smaller ones with all else equal. To that end, I try to find and remove dead code and deduplicate code. This didn't require more proof for me anyway.
LLMs are better at complexity than us, but they have all the same issues in following code that we do. For that reason I try to reduce nesting, early return when possible, and avoid making things more global than they need to be.
Tests are a no-brainer part of good harness engineering.
I call files that are changed really frequently "hotspots". The problem with hotspots is that if you're using many agents in parallel, they easily cause merge conflicts. It's just another good signal that a file needs to be broken up.
I used to have a rule that asked it to suggest a 3rd party library for any repeated pattern it finds. I removed it after an LLM told me that 3rd party libraries can be a liability if it's not trained on them. It's often easier to solve patterns internally where the source code is available to read.
It actually works pretty well. If you're refactoring with this often, it will only find issues in code you've worked on recently and haven't refactored. So it's really finding and fixing things as it goes, if you keep the discipline to refactor frequently. I have a codebase that's approaching 250K LOC (that I've never looked at) that is still workable for agents.
This approach is a little outside of regular harness engineering where we'd usually be trying to make sure that the LLM produces the code that we want. The approach is after-the-fact, and can be applied to existing terrible codebases. I think that's useful because even if all your additions to the codebase are excellent quality, the codebase can still drift into something unmaintainable.
Ultimately none of this is all that scientific which irks me a bit. I tried to ask the LLM about things that made sense and incorporate its guidance, but I think running some real evals (hitting it with a barrage of tests that try things and measure context-usage as well as results with different rules/tasks/codebase) will get more reliable results.
I do plan to do that eventually and post updates. In the meantime, I'd love to hear from you about how /defrag works for you or what other strategies you think I'm missing.