How you do QA is probably terrible.

Of course I don’t know you, but most software development shops that I’ve worked on or chatted with “do QA” in a way that is pretty terrible.

Almost everyone does it, and almost everyone considers it a “best practice”, and yet somehow it never even got a name. I’m going to call it “block-testing” because it’s the kind of testing that unfortunately makes continuous deployment and continuous delivery fundamentally impossible.

So what is it already?!?!

It’s the practice of having humans try to regression-test an entire product or some high-value subset of features across the entire product usually with the goal of finding reasons to stop it from being released. The most extreme (worst) case of this being when a specific role exists solely for this purpose (almost always given the misnomer “QA”).

Just to be super-duper clear: I’m aware that almost everyone is doing block-testing. That doesn’t mean it’s a good fit for every organization though. If you actually want to achieve continuous deployment or continuous delivery, this is one of the big-batch operations that you’ll need to forego.

What’s so bad about it?

  • A single manual run-through of a broad test script is slow (compared to existing automation)
  • It’s expensive (compared to good automation, over the long-term)
  • It frequently misses all but the most egregious regressions (more often than alternatives)
  • It’s antagonistic, pitting different roles at the company against each other.
  • It confuses where the responsibility for quality lies
  • There are many other possible solutions to the same problem, many of which are faster and more comprehensive.

Alright a couple of those things are obvious to most people I think so let’s just dig into a couple of the less obvious ones:

It’s antagonistic

Hiring people whose sole purpose is to find reasons to stop a release is counter to the goals of a company that is trying to release software quickly. Of course block-testers are actually tasked with the important goal of making sure each release is high-quality (and that’s a critical goal!), but the end result is that as block-testers, they naturally tend to optimize their role to “trying to find release-blockers”. They’ll log other defects too of course, but those don’t have the same importance to them (or anyone else) as issues that block releases.

Also when you’ve got one group of people responsible for finding fault in another group’s work, you’re going to have conflict between the groups. Engineers are going to start saying that the bug isn’t theirs or the bug isn’t “that bad”. Testers are going to say the opposite. Each group is going to blame the other when management wonders why releases are delayed or infrequent. You’ll get the same finger-pointing when a defect gets through to production as well. The ensuing arguments are a massive waste of time and each “side” becomes less empathetic to the other’s plight over time.

It Confuses where the Responsibility for Quality Lies

I’ve heard the words “You’re the tester! Test it!” a few times in my career. The sentiment is that development and testing are distinct phases rather than something that happens iteratively during development, and “the developers do the developing and the testers do the testing”. “This frees developers up to just focus on just development”.

Nothing really could be further from the truth and I have some serious doubts about the experience of any developer that thinks that they shouldn’t be testing their own work along the way as if they can hammer out 15 or more LoC and won’t have to bother to try and see if it works. The fact of the matter is that quality is entirely the responsibility of the developer and any successful developer is testing constantly as they go. Conversely, the tester’s only role is to assess the quality. The tester has no responsibility whatsoever to improve it.

At the same time, the very presence of the tester is a very confusing signal. It signals that the developers aren’t wholly responsible for quality (and maybe even that they’re not trusted with it!). It diminishes their skin in the game ), and disincentives quality. Of course a tester-only role has no ability to actually fix defects, and you can’t sanely hold someone responsible for something they have no control over.

I would wager that this division of responsibility is the single greatest cost of block-testing. The testers and the developers start to engage in a kind of game of ping-pong where the developer tries to see what they can get past the tester, and the tester tries to find ways to block the release for any reason at all. I’ve seen a feature go back and forth from developer to tester more than a half-dozen times before the tester agrees that it’s okay to release. That back-and-forth often plays out over days through an issue tracker for trivially-fixed-but-important issues in high value features. If you’re looking closely for this kind of activity you can see it all over the place, and it can look about as effective as a football team that uses telegrams to communicate mid-play. These hand-offs are all classic waste from a Lean perspective.

There are many other possible solutions to ensuring quality, many of which are faster and more comprehensive.

Once you realize the harm, the cost, and the inadequacy of block-testing, you naturally have to ask, “but is there anything better?”. You can’t really fault an organization for block-testing if there are no alternatives.

Let’s chat about some alternatives:

Move Fast and Avoid False Dichotomies

It’s really unfortunate that the industry has resolved itself to the belief that you have to either go slow or you’ll break things, and that quality and speed are an inevitable and universal trade-off. This is a really limiting belief.

In order to get creative, let’s forget the “best practices” for a minute and get back to first-principles: Here’s what we’re trying to do:

  • We want to minimize users’ exposure to defects and outages.
  • We want to minimize the time spent between a developer completing a change on their workstation and it getting to users to keep delivery radically fast.
  • We want to catch more issues than a human can.

Preventative Measures

Preventative measures are anything that you might do to prevent a defect from affecting a user. They’re a pretty critical part of achieving goal A.

However, goal B means we really have to be careful what steps we put between the developer and the production environment. Ideally they’re critically important and they’re fast. Each of these steps has to pay for themselves. Because of this, you have to be really careful about what preventative measures you choose. Some really fast preventative measures: static-typing, automated tests, broken-link-checking spiders, linters.

Mitigating Measures

You also have to start thinking about how to achieve Goal A with more than just preventative measures. Preventative measures aren’t everything. Even the most ardent block-testers have non-preventative measures: feedback from the end-user (even if it goes through 5 levels of support first). I’ll call these post-deployment measures “mitigating measures”. There are 3 main categories for the types of mitigation:

  • Reducing the severity of a defect (# of users, impact per user, etc)
  • Finding defects faster
  • Fixing defects faster

Hopefully those categories get your imagination started. User feedback is a mitigation measure; it’s just the slowest and most expensive one. Here are a few that do better:

  • Fast automated deployment. If you can’t get a fix out quickly, more users will be affected for longer than necessary.
  • Telemetry: logging, metrics, and alerts. Instrument your codebase in as many ways as possible to make it self-report issues it might be having. Alerts will tell you when things are going wrong faster than user-feedback and often you’ll learn about things that humans (manual testers or even end-users) did not, or even could not find.
  • One-click rollback. Make it so that your production environment can be rolled back to the previous known-good deployment as quickly and easily as possible.
  • Staged roll-out. Coupled with great telemetry, a gradual deployment process allows you to halt deployment and roll-back (even automatically) if there is a spike in crashes or logged errors before something is fully deployed.
  • Feature flags / Feature toggles / kill switches. Control (post-deployment) what users get what features and when. If there are any problems, only the allowed users are impacted, and the broken functionality is quickly reverted.

Of course, you will probably want to do these things even if you never eliminate block-testing from your release process. The prevailing practice seems to be to forego these things and expect the human block-tester will catch everything though. That’s a kind of bury-your-head-in-the-sand quality management.

If you try these things out though, you’re almost certainly going to see that these practices are cheaper, faster, and more effective than block-testers though.

“But Automation is Expensive!”

Automation IS expensive! I have no argument against that. It is. I’ve seen numerous well-covered codebases have more test code than application code, so the upfront costs of automated tests are probably close to the same as writing the application code itself.

Over the long haul it pays for itself though. A well-covered codebase easily has thousands of tests each getting run thousands of times a year, so we’re talking about millions of behaviour verifications on the conservative end of things. Humans simply cannot compete with that. Humans are slow, and even the best of them are terrible at repetitive detail-oriented tasks. As expensive as automation is, block-testers are even more expensive.

“But developers are terrible at testing!”

They’re really not. Many haven’t had much practice because they’re working in block-testing environments where the responsibility is muddled, but they’re actually quite good at it with a little practice when they have the full responsibility and trust for it. Don’t be surprised when they don’t choose manual-testing for the solution to every quality problem though.

“Do we fire our QA?”

Block-testing is so common in the industry that many people have a really hard time understanding the place of specific testing personnel in the SDLC without it. There’s absolutely a place for specific testing personnel, but they’ve got to start contributing to improving quality beyond block-testing, and that change is understandably difficult.

Compared to computers, humans really are slow and terrible at repetitive detail-oriented tasks. However humans have well-known strengths that computers do not: they’re creative and curious.

So there are still places for manual testing:

  • Post-deployment exploratory testing. This is a great match for curious humans, and doesn’t slow delivery.
  • Feature-by-feature for new features only, with the developers present and actively involved, before code-complete of that feature. It’s probably best if the tester doesn’t actually do the tests themselves at all, but instead talks the developer through what tests to perform. This hands-on approach improves the developer’s testing ability more permanently and doesn’t confuse the fact that the developers own quality.

With that said, I’ve been working with multiple teams for the better part of a decade that have no manual tester role at all. When they do test manually, the developers are testing things themselves, often as a group (mob testing!). You’ll probably want to at least consider eliminating the role of manual testing entirely.

There’s still a place for quality-minded people other than manual testing though. In fact, the concept of QA and block-testing are really opposites. Actual quality assurance is about making sure that quality is baked into the process from beginning to end. Personnel that just do block-testing are not doing that at all. At best they’re quality control (QC) and that’s a far less valuable role. QA would be involved in a bunch of completely different concerns: How can we prevent defects from ever existing? How can we find them faster? How can we mitigate their impact? How can we recover from them faster?

Here’s a laundry list of possible Quality Assurance tasks:

  • run learning post mortems.
  • measure quality in many different ways (defect rate, MTTR, MTBF, etc).
  • regularly visit production logs and metrics to look for live quality issues
  • coach devs on how to more aggressively test their work
  • get into test automation
  • liaise with users and customer support about quality issues
  • help establish quality criteria for a task before it gets started, and throughout its development
  • Look for patterns in defects

“Does it really never make sense?”

The economics of block-testing make more sense if you anticipate very few releases with increasingly smaller differences. Agencies write this kind of software, but I haven’t personally done this kind of work in over a decade so I could be convinced of the economics either way by someone with more recent and extensive experience . Here I’m specifically talking about teams working on a software product that exists over a long period of time.

“But my situation is different because…”

Okay! I believe you! There are rarely one-size-fits-all practices in software development. I’m simply submitting this counter-argument for consideration. I’ve certainly worked at and heard from many organizations that should strongly consider stopping block-testing because the value proposition for them is just not there.

I’ve seen this pattern a number of times in my years as a software developer, and it’s always hard to watch:

(1) “We don’t need automated tests on this stuff. It’s way too simple.”

( The codebase grows until it’s more and more out of control. We’re getting scared to make changes and manual testing is taking forever. )

(2) “It turns out that we need tests, but this stuff isn’t very testable, so the most focused tests we can write are integration or e2e tests.”

( Test suite starts off well, but quickly becomes very slow and very flakey. When tests fail, it takes a lot of debugging to figure out why. They also require all kinds of complicated setup making them too hard to write.)

(3) “Automated testing is really expensive compared to the benefit! We’re probably not going to bother to write tests for simple things.”

It’s obvious why this pattern keeps occurring: Every next step makes sense given the current state.

It’s hard to break out of the cycle too, because it’s self-reinforcing. Working around the damage seems to create more damage. The things that seem like obvious solutions from an “in the spiral” point-of-view only make the problem worse.

Undoing the damage takes a lot of expertise and investment. But most importantly it’s expensive as hell the longer it goes on. You waste time:

  • avoiding necessary changes/improvements that are too scary
  • writing hard-to-write tests
  • debugging huge sweeping e2e tests
  • maintaining overly complex code (“You shouldn’t have to change the code just to test it!”)
  • doing more manual testing
  • rerunning flakey tests
  • investigating more obscure testing tools and strategies
  • waiting for test runs to finish
  • waiting for releases to be approved by humans
  • dealing with defects in areas “that aren’t worth the effort to test” or are “too simple to test”

Over the long run, cutting quality measures almost always costs you more than you gain.

I think that the most important thing I’ve learned while trying to improve my abilities as an organizational change agent (read: a programmer that likes to get his way), is to never ever present some idea as a best practice. If you don’t know the reasoning well enough to explain it with that reasoning alone, you don’t understand the problem or the practice well enough to suggest it.

Escaping Solution-first Thinking

One of the most powerful techniques you can use for convincing people is to stop coming to them with improvement ideas when the problems they solve aren’t already obvious to them. I’ve been on both sides of these kinds of attempts to convince people and they rarely work. When I’m on the receiving end and I hear “Hey Gregg, I think we should do X”, I immediately need to know:

  • What’s the problem you’re trying to solve?

  • Why are you prioritizing that problem ahead of our other problems?

  • What other solutions did you consider for solving that problem?

  • What are the pros and cons of each possible solution?

  • What made you consider this solution to be the best of all them for your situation?

The first question is the foundation of it all though: What’s the problem you’re trying to solve? When I’m thinking of suggesting a practice and I can’t immediately and clearly answer that, I know I’m stuck in solution-first thinking. No one will be convinced by someone with that approach except the very few that happen to share your biases. I generally drop any idea like that entirely at that point instead of inventing an after-the-fact rationale. If my idea wasn’t conceived “problem-first”, it’s not coming from the right place.

(I once had a junior engineer bravely ask “Why do we use version control?”. I had been using version control for almost 2 decades and still really needed to force myself through those 5 questions. It was illuminating for both of us.)

“Best Practices” imply a Fixed Mindset

Is your best practice really “best”? Will there never be a better one? Did you give your team a chance to find their own solution that is better for their particular situation?

In general if you think you’ve found a best practice, you’re probably not going to look for one that’s better. You’ll be standing in the way of your team improving and in fact the craft of software development improving as a whole. You’ll be the last to adopt a better a practice if someone presents one.

What you call a “best practice” today came from an open-minded team that decided they wanted to try for something better than the current state of practice. It didn’t come from a team that was happy with the status quo.

There’s Value in Emulation

With all of that said, for things that are inconsequential, just using a “best practice” is often the smartest thing you can do. One real-life example for me was when some engineers on my team decided to use http’s DELETE method to update a record in a new way after already using PUT for the usual scenario. They tried it locally and it worked. Why they didn’t simply just use a POST to a new url was beyond me, but the reinvention of that proverbial wheel came back to bite us when our reverse proxy didn’t allow bodies on DELETE requests. The lesson here is that if there’s a standard way to do something and you’re working in an area that doesn’t really matter to your team/company, I’d always recommend just using the standard way. Sometimes you do just need a proverbial wheel and there’s no reason to reinvent it.

Emulating others is a necessary step in learning too. If you’re used to looking at practices used elsewhere and considering reapplying what has worked, definitely keep doing that. You should always keep learning about practices used elsewhere and it would be wasteful to always be reinventing the wheel by working only from first principles.

What I’m saying though is that for practices that really matter to your team, the next level is to additionally take the time to deeply understand:

  • What problem is being solved by that practice
  • At least a few other possible solutions
  • The downsides of that practice (there are almost always downsides)
  • If the practice is still the best one amoung all the others for your situation, and why

Being rigourous in your understanding of a practice and its rationale ensures that you don’t apply the practice unnecessarily, or to the wrong problem, or incorrectly. Without this, you’re just emulating a mature team and not actually maturing; you’re cargo-culting.

Proxy Goals as Best Practices

The anti-pattern of “best practices” promotion includes arguing for hand-wavey proxy goals too. I’m talking about reasoning like “Let’s do X because X is…”

  • RESTful
  • functional
  • object-oriented
  • what Linus Torvalds does
  • what I did before on my previous team
  • DRY
  • Agile
  • Lean
  • trending on hacker news
  • what jira expects us to do
  • what github expects us to do
  • what the scrum guide expects us to do
  • what facebook/spotify/[insert famous company here] does
  • what a great blog post I just read says to do (I’m pretending there’s no irony here)

You may think some or all of these are great proxy goals to pursue (and maybe some are) but I personally think a few of them are terrible proxy goals for almost any situation and I can gaurantee that other people do too (some of which are probably on your team). Ultimately there are two problems with arguing for proxy goals:

(1) Proxy goals can become misaligned with your actual goals. (2) When you’re trying to convince someone, you very rarely will know if they are aligned with your proxy goals. They’re much more likely to be aligned with the team’s or the company’s actual goals though.

I’ve seen examples of all of these types of arguments for proxy goals (and made a couple of them myself) and though they can feel like shortcuts to a convincing argument, they really just make your job as a convincer much harder. First you have to convince somebody that Agile, for example, is a good goal, and then you have to convince them that your solution is indeed the most Agile. Usually they’re just trying to get work done, so these unnecessary leaps of logic are extremely taxing on them. You’re not going to win the hearts and minds this way. I’ve been through a few Agile transformations now, and the only successful one was the one where I never used the word “Agile” (It turns out that in general, jargon alienates people! Who would have thought?). Suggest real solutions to real problems in plain English and people will get behind you.

Results are what matter

Certainly your customer and/or the company’s stakeholders generally won’t care about what you think is a best practice if it’s not obviously changing actual results. I’ve personally seen attempts at many of those indirect goals I listed be both successful to one team’s goals and detrimental to another’s. When they’re detrimental, you’ve blown all your social capital for future influence, so it’s important to get them right. And any practice is hard to get right if you don’t understand why.

Throwing out your “best practice” mindset and stopping solution-first thinking forces you to be working on solving actual problems. Solving actual problems gives you a lot of clout within your organization, and allows you to more easily push for more ideas in the future.

I’m part of a team that’s just finishing up a Javascript-to-Typescript conversion of an approximately 60 K LOC (lines of code) Javascript web service (resulting in about 86 K LOC of Typescript), so I wanted to throw some anecdata at the conversation around the costs and benefits of static-typing.

A few caveats worth mentioning right away:

  • We continued adding features while converting. +
  • We spent a bunch of time additionally working on making the service’s input types definable in Typescript with runtime validation (we convert typescript to json-schema during the compile step). This expanded the effort significantly, but it also allowed us to trust that type safety would hold at run-time.
  • We additionally converted from Promises to async/await as we went.

Alright, Let’s do some math

In calendar time the conversion took about a year, but I estimate the actual developer effort was about 6 engineer-months. ++

85,000 LOC over 6 developer-months is around 14K LOC per month, or 475 LOC per day. From a naive line-by-line viewpoint that rate could seem slow, but there were many things that are easy in Javascript that are just infeasible in Typescript and had to be reworked entirely. +++ We also spent a chunk of time improving performance of compilation.

While I’m giving LOC stats, it’s going to be worth noting that this codebase additionally has ~100K LOC worth of tests (that we left in Javascript), giving us a little over 80% coverage.

The Importance of our Test Suite

During the conversion, we found surprisingly few defects in the existing code. It was common for a single developer to be working on conversion full time and not find any defects at all in a week.

I probably converted over 10% of the code myself and never personally ran up against any defect that was user-facing. We ultimately did find a handful of user-facing defects in the pre-conversion code though (maybe 5?) and most had to do with the input validation not being strict enough. Typescript was useful for helping us find those cases in a way that tests probably wouldn’t be (It’s rare for our automated tests to be testing bad input extensively).

The conversion process did in fact introduce test breakages quite regularly though (that would have been bugs). We used the “noImplicitAny” compiler flag to help us enforce using types everywhere. At the file-level, this is an all-or-nothing conversion strategy, so on a large or complicated file, conversion could take hours before you could run the tests again, and continually I was finding that even with a successful build, the tests would fail in dozens of different unexpected ways, all indicating new defects.

Let me repeat that for emphasis: the act of changing existing working code to add types introduced defects that only the tests caught. This probably happened in more than half the files that I converted.

Conversely, what we’ve accomplished with tests (and no type-checking) on this codebase over the last 5 years is pretty mind-boggling to me. We converted:

  • From callbacks to promises
  • From express 2.0 to express 3.0 to koa.
  • From mongodb to Amazon Aurora for core data.
  • From coffeescript to javascript

And throughout each of these conversions, we deployed multiple times per day. None of the conversions were “big bang”.

What I like (because I like Typescript):

Well I’m using VS Code now, instead of just vim, as one does when switching to Typescript. It’s more sluggish of course, but it comes with better in-editor support for Typescript and that means that:

  • I can see a bunch of potential defects in-editor without running the tests at all. That’s fast feedback.
  • Autocomplete is a lot smarter than what I’ve seen in any vim plugins.
  • Documentation about different function interfaces or object shapes is available right in the editor.

That last one is really key for me and ties into a larger improvement than just those in the IDE. Typescript encourages us toward standardizing object “shapes”. With Typescript interfaces, we can say things like “Alright a ‘user’ will always look like this. It will always have these fields.” If some other representation has less fields, or more fields, and we really need that, we’ll have to face the pressure of adding an additional type. That’s pretty powerful and I really appreciate it. Before, just reading application code, I’d have to consult tests to see exactly what shapes and interfaces were supported or returned and there was no pressure to simplify and consolidate on just one shape. That hurts learnability and maintainability.

There are of course a bunch of places where defining types is a productivity drag. It’s certainly more code to write and more code to read, and for certain types of work, it can amount to just noise. When writing new code though, I do really appreciate how easily I can call on file-external classes/methods/functions/properties without having to jump around from file-to-file and figure out the proper way to do that.

Static Types and Quality

It’s pretty clear to me that we’re not really catching many more defects with Typescript compared to the effort that we’ve put into it.

We’re catching defects faster though with faster feedback, right? Well I don’t know about that either. We definitely are for certain classes of bugs. If a function that adds 1 and 1 together returns the string “2”, that will get caught much faster. There are definitely lots of typos and misspellings that are caught more quickly too. I’m enjoying that immensely and it’s a real productivity boost.

It’s not free though. Compile time is around 40 seconds for me (and worse for others). So if I want to actually test runtime behaviour I’d say I’m now at a disadvantage. Firing the server up and manually testing locally now takes almost a minute when it used to take less than 10 seconds. Even with incremental compilation and a warmed cache, there’s a ~10 second compile time, which makes doing TDD (my usual way of writing code) a lot slower because of the slower feedback cycle. So, many types of defects are faster to detect, but many types are also slower.++++

But I don’t have to write as many tests, right? I have no idea why anyone would think static-typing replaces certain kinds of tests. I was never testing types. Does anyone ever do that? I still write exactly the same number and types of tests. Testing hasn’t changed a bit.

If it sounds like you’re working in a scenario that is close to ours (a multi-year SaaS product), my recommendation is that if you have to choose between static-typing and tests, choose tests. To forego tests in this type of codebase is not only lazy but lazy in a way that actually leads to more work. It’s hard to watch developers avoid the tiny upfront investment of tests when I can see what it’s given us over the lifetime of this codebase.

You don’t have to choose either, so if you’re sure you want a statically-typed codebase, feel free to do that too. Typescript is not going to be a silver bullet though. The if-it-compiles-it-works approach is a recipe for long-term pain.

Additionally, if you don’t have tests and you want to do a conversion from Javascript to Typescript on a large codebase, I’m fairly certain that you will go much slower than we did, and you will have a tonne of bugs. Of course our conversion is just one data-point, so your mileage may vary.

+ …so it’s not strictly true that 60 K LOC of Javascript converts to 85 K LOC of Typescript. I’m sure there are better sources of information for how many LOC conversion adds.
++ I’m pretty confident in that estimate because around 80% of the conversion was done by one engineer fully tasked with the effort.
+++ eg Most middleware stacks in node.js web frameworks rely heavily on monkey-patching which does not easily translate to Typescript. Also we were doing a bunch of sequential mutations of objects in Javascript that are really tedious to simply translate to Typescript because of the necessary proliferation of types.
++++ I can’t really totally blame Typescript for this. Our Typescript-to-JsonSchema converter is included in this compile stage, and it takes a significant chunk of the time. We really want to ensure that type-safety holds at run-time as well though. Additionally, I do think we should be relying on incremental compilation (with a watcher) more. It’s possible with more effort we can fix most of the delay.

I’ve always worked most heavily with dynamically-typed languages. They’re a great choice for me and the teams that I’ve worked on because of the kinds of work that we’re doing. It’s pretty normal though for these teams to add team-members that are completely unaccustomed to working with dynamically-typed languages because they come from backgrounds where they worked mainly in statically-typed languages.

There are a few things about working with dynamically-typed languages that generally seem to blow the minds of new-comers that are accustomed to statically-typed languages, and I’ll talk about a bunch of those things here. There’s a huge amount of hype these days around type-systems too, and so I’d like to dispel any connotations that they’re necessary in all cases, or that they’re sufficient in any situation.

Get a Linter

Just because you’re disconnected from static types, doesn’t mean there aren’t static analysis tools. Every dynamic language I know has a linter that can be integrated with most common editors or IDEs, and the linters are quite good at telling you in real-time about a bunch of common typos and mistakes you can be making. Make them part of your CI build too, instead of the compile step you’re used to in order to keep them out of your codebase.

Write a Tonne of Tests

I know a bunch of people in statically-typed languages already write a tonne of tests, but if you’re one of those ones that thinks that the static type-checker negates the needs for tests, you’re reeeeeeeeally going to have to get over that and start writing some tests if you’re going to be successful with dynamically-typed languages (and it’d almost certainly improve your statically-typed codebases as well).

It’s pretty normal to write more test code than application code in dynamically-typed systems. There are no quality efforts that I’ve ever seen that have a higher pay-off than an extensive test suite though.

I’ve sometimes heard the argument that static-typing can mitigate the need for a whole tonne of test-writing but I’ve never found this to be remotely true. If you’re testing output types and not the actual output values, you’re doing it wrong. Just test the values; it’s much easier and much more valuable.

I’ll be brutally honest with you: If you find yourself on your first project that uses a dynamically-typed language and it doesn’t have an extensive test-suite, you’re almost certainly in for a very rough time as the codebase or the team gets larger. I generally have the same opinion about statically-typed codebases as well though.

Don’t try to validate types at runtime on every internal method or function.

Of course you need runtime “type” validations. Just like in statically-typed codebases, you still need to verify user input, or that your data sources or 3rd-party services are giving you data in the format that you expect. You should absolutely do these validations that I’m going to call “edge validations” because they’re on the boundary between your application and the outside world.

With that said, there’s usually no good reason to sprinkle input-type validations on every internal function and method to try to replicate what you’ve had in the past with a statically-typed codebase. I agree that it’s nice to assure quality internally (and I heavily use micro-tests for that), but 90% of the time a function will throw appropriately when it gets an argument type that it doesn’t expect as soon as it tries to use it. Dynamically-typed languages employ Duck-typing, so that problems with input arguments are detected when and only when a function or method tries to interact with the argument in a way that’s not expected. It happens at runtime, and the stack comes from somewhere deeper in function, but it’s generally got all the information you need to debug it. Of course on its own debugging a problem like this would be much easier in a statically-typed language, but you’ve got a huge suite of tests not only to mitigate the problem of run-time-only detection but also to add validation of the actual values.

And with that said, I’ll still sometimes (though very seldomly) add some assertions at the top of a function or method to ensure that the input arguments are valid. Often it’s to check things about the input that most type-systems aren’t much help with either, eg: “Is the number a positive one?” “Does the string start with ‘http://’?”. etc, etc.

Use the tests as (one form) of documentation

I’ve always loved the way that statically-typed systems enforce a certain type of code documentation. The type annotations make it easier for me to understand how to consume the functions and methods in the codebase and also what to expect inside the function or method if I’m making internal changes. The great thing about the type annotations too is that they’re gauranteed by the type-checker to be correct whereas regular old source-code comments seem to always quickly go out of date (not that you shouldn’t still try to use comments appropriately).

I’m probably going to sound like a broken record here, but again the extensive test suite can be quite useful for this. A test shows you how to call those functions and methods and what they’ll return. I often find the tests more valuable than types too when I have both because the tests will show me argument and return values and not just the types (ie, It’s more valuable to know that add1(1) returns 2 and not just a number type.)

Microtesting is specifically useful here too, because if you only do large black-box-style tests, your tests won’t be very useful for internal documentation. It’s also important to organize your tests in such a way that the relevant ones are very easy to find. For a microtest suite, you’ll probably want to mirror your application file structure as much as possible. I’ll certainly admit that statically-typed application code is better at putting that documentation right in your face as you’re reading source code and that’s really powerful, so at least try to make sure relevant tests are as easy to find as possible. (I haven’t got any experience with Python’s doctest, but it sure looks cool too.)

But most of all, when looking for valuable sources of code documentation, get used to considering tests and try to read them when you have questions.

Get the Right Attitude

Sometimes the most important thing that’s required is an attitude adjustment. You’re working with a different programming paradigm; you’re going to have to do some things differently.

The fact of the matter is that static types are not necessary to…

  • …build certain types of software, and in some cases they’re even a poor choice.
  • …avoid defects; they’re just one possible tool, and not always the best one. (Try more automated testing!)
  • …build large systems or work with a large team. There are in fact many large successful companies built largely on dynamically-typed languages, and there are many large open-source software projects that build on dynamically-typed languages.

These dynamic languages have accomplished a hell of a lot in computing. You shouldn’t need much more evidence than that.

One of my favourite ways to tackle tech debt is to fix it as I work through product requirements. There are some great advantages to it:

  • It automatically prioritizes tech debt burndown in areas that you’re probably going to touch again soon, which is where it’s most valuable.
  • It doesn’t require blocking any product development flow. It just slows it down (probably imperceptibly).
  • It doesn’t even require conversations outside of engineering.

I’d hazard to say this is probably considered a best-practice, so people making technical improvements this way are in good company. I call this “the organic method” because improvements are happening naturally, or “organically” as other changes occur.

There are some downsides though (especially with large codebases with many affected developers):

  • It hides the actual cost. Is it really better for product development to be imperceptibly slower? Wouldn’t it be nicer if costs were more explicit and obvious?
  • It’s a lot easier to do tech improvement and product development separately. Doing two things at once is almost always more complicated.
  • It’s easier to find patterns, better solutions and shortcuts for even fairly mechanical technical improvement work if you’re focusing only on that technical improvement work for some period of time.

Usually the biggest downside is that it’s slower.

In practice, I always find that it’s much much slower than you’d think. Here’s a graph of a Javascript to Typescript conversion effort that I’ve been tracking for the past 11 months:


There are 2 small steep declines here that show the efforts of single individuals for short periods of time, but otherwise this graph (spanning almost a year) is a 64-file improvement out of 223 files in 11 months. At that rate, the effort will take 3.5 years.

I’ve tracked a number of similar efforts over the last year and the results are similar. My previous experience with organic improvement in large codebases feels pretty similar too: Without specific mechanisms to keep the conversion going, it naturally slows (or worse, stops).

Why does it matter if it’s slower?

Maintaining the incomplete state of the conversion is sneakily expensive:

  • It’s harder for newcomers to learn the right patterns when multiple exist
  • Engineers need to remember all the ongoing efforts that are underway and always be vigilant in their work and in their code reviews of others work
  • Diffs are easier to understand when they don’t try to do too many things at once
  • Copy/pasta programming, forgetfulness, and uneducated newcomers lead to contagiousness; propagation of the wrong pattern instead of the right pattern
  • When you’re really slow, you’re even more likely to have multiple of these efforts underway at once, compounding the complexity of fixing them
  • If you’re slow enough and not actually doing cost-benefit analysis, patterns can be found that are “even better” than the ones underway. This is how you end up with even more ways to do the same thing and a smaller number of engineers that find joy in working in that codebase.

Most importantly though, if there’s really value in paying for that technical improvement, why not pay it sooner rather than later? Ironically, most of the least productive (and least fun) codebases I’ve seen are because of people making numerous actual improvements but then leaving them only partially applied. Good intentions without successful follow-through can easily make the code worse.

For larger technical improvements (ones that affect too many files to pull off in a week or less) you want to make sure that:

  • You have a vague idea of the cost and you’re actually making an improvement that you think will be worth the cost.
  • The timing for doing it now is right (and there isn’t something higher value you could do instead)
  • You actually have a plan that converges on total conversion in a reasonable amount of time instead of something that just leaves the codebase in an inconsistent state for an extended period of time.
  • The goal, the timing and the plan are generally approved by your teammates (even if unanimity is impossible)

Once you’ve got those 4 factors in place, you’re probably better off in the long run if you capitalize on the improvement as quickly as possible. You probably don’t want to cease all product development for a huge amount of time to do it, or send one developer hero off to fix it all, but you’ll probably want to come up with something better than organic improvement too, if you really care about that improvement.

In my experience, cross-functional teams align people to business goals best, and so they can get to real results much faster and much easier than teams made up of a single function. They really don’t seem to be that popular, so I thought I’d talk about them a bit.

Here’s some common chatter across mono-functional teams:

The Engineering team:

  • “We should never be taking on tech debt. Tech debt slows us down!”
  • “We should stop everything and clean up all our tech debt, regardless of cost or current business goals”
  • “We should convert all our code to use this new framework everywhere because it has [INSERT TODAY’S LATEST DEVELOPMENT FAD]”
  • “It’s the testers’ job to test, not mine”
  • “Works on my machine!”
  • “Let ops know that I used the latest version of the database client library, so they’ll have to upgrade all the databases”

The Testing team:

  • “Let me see if I can find a reason to stop this release”
  • “We need X more days to test before the release”

The Frontend/Mobile/Ios/Android team:

  • “That bug is on the backend.”

The Backend Team

  • “That bug is on the frontend.”

The Operations Team

  • “We haven’t got time to do the release today. Let’s schedule something for early next week.”
  • “Engineering doesn’t need access to that. Just tell us what you need.”

The Design Team

  • “We don’t want to help with a quick and dirty design for that feature experiment. It doesn’t fit into our vision”
  • “We’ve got the new total redesign for this month specced out.”

The Product Management Team

  • “That technical debt burndown can wait, right?”
  • “We should do this the fastest way possible.”
  • “Here are the detailed specs of what I want you to build. Don’t worry about what problem we’re trying to solve.”
  • “I’ve finally finished our detailed roadmap for the year.”

Do you see the patterns?

These teams…

  • optimize for the areas of their specialization, not for the business’ goals or for other teams’ goals.
  • defend their area of specialization by hoarding power and information
  • constantly try to expand their area of specialization at the expense of the business’ goals
  • focus more on looking busy than getting real business results
  • push blame to others
  • too willingly take on expensive projects where others pay the majority of the costs or where the value isn’t aligned with company goals.

So what to do instead?

Well you get these mono-functional teams because someone talking about a specialty or discipline once said something like “X is important. We should have a team for X.”

My suggestion instead is simply to start saying “X is important. We should have X on every team.”

This leads to a team with a bunch of different but cooperating specialties. The only thing they all have in common is their team’s portion of the business’ goals.

Think of it this way:

  • If the members of a team don’t share their goals can they really even be called a team?
  • Why would you give goals to a team without also empowering them with all the specialist skillsets and ability to also deliver on those goals?

In general I’ve found that only a cross-functional team can make the proper trade-offs on its own, react quickly to changes in the world, and execute with minimal communication overhead. Once it has all the specialties it needs to autonomously deliver on its goals, you’re set up for a whole new level of speed of execution.

I’m not saying that cross-functional teams solve all the issues above, but they make the conversations happen on-team where it’s much cheaper than across teams, and the conversations are much easier because people don’t have to guess each other’s motives nearly as much.

It’s not any easy transition either if you’re currently on mono-functional teams. In my experience though, cross-functional teams can really make mono-functional teams look like like a morass of endless disagreements and easily avoidable meetings.

Too often when I see a team trying to replace a bad/old/deprecated pattern that is widespread in a codebase, they default to what I call The Hero Solution: One person on the team goes through and fixes every case of it themselves.

This can work for very small efforts, but it’s almost always a terrible solution in larger efforts for a few reasons:

  • When the bad pattern is widespread this is the slowest way to fix it and the slowest way to get to value from the new pattern.
  • There’s nothing in this policy that stops other people from continuing to add the bad pattern. Indeed there will often be code with the bad pattern that they want to copy/paste/modify, making the bad pattern almost contagious.
  • Teammates may be working in the same areas of the codebase causing merge conflicts that slow both the teammate and the hero down.

Here are a few tips that will see better results:

Track the bad pattern

Find a way to track instances of that bad pattern over time. Often a simple git grep "whatever" | wc -l will tell you how many cases of it you have in the codebase. Check often and record the values. Whatever your strategy is, if it’s not trending toward 0 in a reasonable timeframe, your strategy is not good enough. Come up with something else.

I can’t tell you how many cases I’ve seen of efforts trending toward multiple years (sometimes as much as 10 years) as soon as I started measuring over time, determining the rate of change, and extrapolating the completion date.

If you do nothing else, do this. You’ll quickly be able to see the cost (in time) and be able to reassess the value propsition.

Stop the spreading!

Agree with the team on a policy that no new instances of the bad pattern will be added without a team-wide discussion. Add pre-commit hooks that look for new cases (git grep is awesome again) and reject the commit. Look for new cases in Pull Requests. Get creative! Without this, you’re basically bailing a leaking boat without patching the leak and you will have:

  • People that want to copy/paste/modify some existing code that has the bad pattern
  • People that don’t even know the pattern is now considered bad
  • People that knowingly implement the bad pattern because they don’t know about the good pattern.

If you can’t get your team mates on board, your effort is doomed. I’ve seen bad patterns actually trend upwards toward infinity when no efforts have been taken to get consensus or stop the bad pattern.

NB: Because of the regular human effort involved in maintaining consensus and educating and re-educating people about the goals, one of the best ways to stop the spreading is to concentrate on faster total conversion to the new pattern. Having bad examples all over your codebase works against you on a daily basis. Bad patterns are contagious.

Get your team mates involved in the conversion!

Here are a few ideas:

  • Have a rule where no modified files (or functions or modules or whatever doesn’t feel too aggressive for your team) can contain the bad pattern anymore. Figure out ways to enforce this automatically if possible, or in code reviews if not.
  • Break the work into chunks and schedule those pieces within other product work on a regular basis. This is sort of a nuclear option, but if the chunks are small enough (yet still encompass the entire scope of the conversion), you can show regular and reliable progress without stopping production work for any extended period of time.
  • Get other people to help with the conversion! If people are bought into it, there’s no reason one person should be doing it alone. Multiple people working on it (in a coordinated fashion) will reduce merge conflicts with product work, and increase knowledge sharing about the proper pattern. You may even get better ideas about how to convert.

Don’t do things that don’t work.

Stuff that doesn’t work:

  • Efforts that the team as a whole doesn’t find valuable / worth the cost.
  • Efforts that are ill-timed. Should you really do it now? Is this really the most important thing?
  • Efforts that are not tracking toward 0 in a reasonable amount of time. Partial conversions are really hard to manage. They may not be a strictly technical concern, but they are a concern for on-boarding, managing, complexity/mental-overhead, knowledge-sharing, etc. Come up with a strategy that doesn’t prolong them.

Big problems need smarter solutions!

I always try to think about what exactly it is about experience that makes a software developer better. Are there aspects of experience that are teachable, but we don’t understand them yet well enough to teach them? Can we do better to pass on “experience” rather than have every developer suffer through the same mistakes that we suffered through?

I know I’ve had a couple of hard-won lessons over the years that really helped me be more successful in software engineering. I’ve been able to see the warning signs of mistakes to avoid for a long time, but I think only recently I figured out the reasons behind those warning signs. And I think some of the reasons can be explained mostly in terms of probability math, which can then be taught right?

Before I go into this, I’d like to preface this by saying I’ve failed many many math classes. I hardly ever use any advanced math in my work, and doubt many other programmers do. I wish I had more patience and learned more math (especially how to apply it) too. So with that said, here goes…

Lesson: The Certainty of Possibility at Scale or over Time

The first piece of experience I’d like to pass on is that your “flawless” solution to a problem, at scale, over time, will fail.

I don’t know how many times I’ve heard an engineer say something like “This solution is basically bulletproof. There’s only a 1 in a million chance that that corner case you just mentioned will occur”, and then promptly put the solution into an environment that does 1 billion transactions a month.

Here’s how the math looks:

1B transactions/month * 1/1M probability of failure per transaction
= 1B / 1M failures per month
= 1000 failures per month.

Yes, the mathematics of probability are telling us that that particular solution will fail roughly 1000 times a month. This is probably a “duh” conclusion for anyone with a computer science degree, but I see developers (yes even ones with computer science degrees) failing to apply this to their work all the time.

At scale and over time, pretty much everything converges on failure. My favourite thing to tell people is that “At this scale, our users could find themselves in an if (false) { statement.”.

So what does this mean? Well it doesn’t mean everything you do has to be perfect. The downsides of failure could be extremely small or completely acceptable (In fact actually pursuing perfection can get really expensive really quickly). What this tells you though is how often to expect failure. For any successful piece of software, you should expect it often. Failure is not an exceptional case. You have to build observable systems to make sure you know about the failures. You have to build systems that are resilient to failure.

Often I hear people talking about solution designs that have high downsides in the case of failure with no back-up plan and defending them with “But what could go wrong?”. This is a tough one for an experienced developer to answer, because having experience doesn’t mean that you can see the future. In this case all the experienced developer knows is that something will go wrong. Indeed when I’m in these conversations, I can sometimes even find one or two things that can go wrong that the other developer hadn’t considered and their reaction is usually something like “Yeah, that’s true, but now that you mentioned those cases I can solve for them. I guess we’re bulletproof now, right?”.

The tough thing about unknown unknowns is that they’re unknown. You’re never bulletproof. Design better solutions by expecting failure.

Understanding this lesson is where ideas like chaos monkey, crash-only architecture, and blameless post mortems come from. You can learn it from the math above, or you can learn it the hard way like I did.

Lesson: Failure Probability Compounds

Here’s the second piece of mathematically-based wisdom that I also learned the hard way instead: If you have a system with multiple parts relying on one another (basically the definition of a system, and any computer program written ever), then the failure rate of the system is a multiple of the failure rates of the individual components.

Here’s an almost believable example: Let’s pretend that you’ve just released a mobile app and you’re seeing a 0.5% crash rate (let’s be wildly unrealistic for simplicitly and pretend that all the bugs manifest as crashes). That means you’re 99.5% problem-free right? Well what if I told you the backend has an error rate of 99.5% too and it’s either not reporting them properly or your mobile app is not checking for those error scenarios properly?

Probability says that you compute the total probability of error by multiplying the two probabilities, ie:

99.5% X 99.5% = 99%

What if you’re running that backend on an isp and that’s got a 99.5% uptime? And your load balancer has a 99.5% success rate? And the user’s wifi has 99.5% uptime? And your database has a 99.5% success rate?

99.5% X 99.5% X 99.5% X 99.5% X 99.5% X 99.5% = 97%

Now you’ve got a 3% error rate! 3 out of every 100 requests fails now. If one of your users makes 50 requests in their session, there’s a good chance that at least one of them will fail. You had all these 99’s and still an unhappy user because you’ve got multiple components and their combined rate of error is the product of each component’s error rate.

This is why it’s so important when ISPs and Platforms-as-a-service talk about uptime with “5 nines” or 99.999%. They know they’re just one part of your entire system, and every component you have in addition has a failure rate that compounds with their baseline failure rate. Your user doesn’t care about the success rate of the components of the system — the user cares about the success rate of the system as a whole.

If there’s anything at all that you should take from this, it’s that the more parts your system has, the harder it is to keep a high rate of success. My experience bears this out too; I don’t know how many times I’ve simplified a system (by removing superfluous components) only to see the error rate reduce for free with no specific additional effort.

Lesson: When faced with high uncertainty and/or few data, the past is the best predictor of the future.

We saw in the last lesson one example of how surprising complexity can be. The inexperienced developer is generally unaware of that and will naively try to predict the future based on their best understanding of a system, but the stuff we work on is often complex and so frequently defies prediction. Often it’s just better to use past data to predict the future instead of trying to reason about it.

Let’s say you’re working on a really tough bug that only reproduces in production. You’ve made 3 attempts (and deployments) to fix it, all of which you thought would work, but either didn’t or just resulted in a new issue. You could think on your next attempt that you’ve absolutely got it this time and give yourself a 100% chance of success like you did the other 3 times. Or you could call into question your command of the problem entirely and assume you’ve got a 50/50 chance, because there are two possible outcomes; success or heartache. I personally treat high complexity situations like low data situations though. If we treat the situation like we don’t have enough information to answer the problem (and we’ve got 3 failed attempts proving that that’s the case), we can use Bayesian inference to get a more realistic probability. Bayesian inference, and more specifically, Laplace’s Law tells us that we should consider the past. Laplace’s Law would say that since we’ve had 0 successes so far in 3 attempts, the probability for the next deployment to be a success is:

  = (successes + 1) / (attempts + 2)
  = (0 + 1) / (3 + 2)
  = 1 / 5
  = 20 %

This is the statistical approach I use to predict too. With no additional information, if I’ve failed at something a number of times, the chances of me succeeding in the future are reduced. I don’t use this data in a depressingly fatalistic way though — I use it to tell myself when it’s time to make more drastic changes in my approach. Something with a 20% chance of success probably needs a much different, more powerful approach. I also stop telling people “I’m sure I’ve definitely got it this time”.

Similarly, if there’s an area of the codebase that has received a comparatively large number of bug-fixes, I’m going to lean towards it being more likely to have bugs than the areas having less bugfixes. This may now seem obvious, but if you’re the person that did all those bugfixes, you may be biased towards believing that after all those fixes, it must be less-likely to still be buggy. I’d happily bet even money against that, and I’ve actually won a lot of money that way.

I think it’s fair to say that this is extremely counter-intuitive and may on its face look like a form of the Gambler’s fallacy, but remember it’s for use in low-information scenarios, and I would say that numerous defects are clear evidence you didn’t have enough information about the system to predict this bug in the first place.

Relatedly, Extreme Programming has a simplified statistical method of estimating how much work a team can handle in a sprint called “Yesterday’s Weather”. Instead of having a huge complicated formula for how much work a team can handle in a given timeframe, they simply look at what the team was able to accomplish last sprint. It’s going to be wrong a lot of course, but so is whatever huge complicated formula you devise.

If there’s a more generalized lesson that you should take from this, it’s that we work with complex systems, they’re notoriously difficult for humans to predict, and predicting statistically can often get you a clearer picture of the reality of a situation. Resist prediction from first principles or through reasoning. You don’t have enough data, and software systems are normally elusively complex.

With all this said, I know it’s pretty hard to really get a feel for the impact of probabilities until you’ve had the prerequisite defects, failed deployments, and product outages. I sure wish someone had at least taken a shot on telling me in advance though. It also makes me wonder what I’ve still got left to learn.

The 4th criteria of the Joel Test of quality on a software team is:

  1. Do you have a bug database?

You probably do. The test was written almost 20 years ago and if I recall correctly almost everyone had one anyway. Joel was even selling one.

On the surface it seems like common sense. Quality is something you want to manage. So naturally you’ll also tend to want to log things… measure things… track things.

I want to propose a better way for most situations though. It’s simpler, your long-term speed will be faster, and your product quality will be higher:

The Zero Defect Policy

Simply put: Prioritize every defect above feature work or close the issue as a WONTFIX. I’m not suggesting you interrupt any work already in progress, but once that’s done, burn your defect list down to 0.

Why in the world?

  • Bugs are much cheaper to fix immediately. You know the product requirements most clearly at that time. You were touching that code most recently at that time and it’s still fresh in your mind. You might even remember a related commit that could have been the cause. The person who wrote it is still on the team.

  • Most things we’d classify as bugs are really some of the most obvious improvements you could make. You probably don’t need to a/b test a typo fix. If your android app is crashing on the latest Samsung phone, you don’t need a focus group.

  • Managing bugs is a huge effort. If you’re not going to immediately fix them, you have to tag them, categorize them, prioritize them, deduplicate them, work around them, do preliminary investigations on them, revisit them, and have meetings about them.

  • The development team needs the proper immediate feedback and back-pressure to know when to speed up and when to slow down.

Say What now?

Defects are perfectly normal. If you don’t have them, you’re either NASA or you’re developing too slowly. However, if you have too many, or if they’re particularly costly, you absolutely need to slow down and turn them into learning opportunities. In those times, the fix isn’t enough. The fix with an automated test isn’t even enough. You’ll want to look into: * other prevention measures * faster detection measures * faster ways to fix * better ways to limit impact

This is the REAL definition of Quality Assurance. If you do this thoughtfully, and don’t try to make an army of manual testers the best solution you could come up with, over the long term you’ll be much much faster. You’ll be the kind of product development team that the company actually believes when you say something is “done”.

What about the low value, high cost bugs?

Delete them. If they become worthwhile later, you’ll hear about them again. If you can’t bring yourself to delete it, you probably value it too much to not fix it. Just fix it.

What about when my team ends up with a week’s worth of defects and can’t get any feature work through?

There will definitely be dark times. Slow down and learn from them. It’s the best time to talk about speed improvements, because speed and quality are interdependent. You can move much faster in mistake-proof environments. In most cases, nobody cares how fast you ship broken software.

Sounds great, but what about our existing bug database of hundreds of bugs?

Many are probably not even bugs anymore if you’ve been collecting long enough to get hundreds. How many of those old things can you even reproduce? How many have clear enough descriptions that you can even still understand the problem? Is the original submitter still around? Here’s my solution: delete all but the highest priority ones and immediately schedule the high priority ones above other features. Allow people to file reports again if they really care about a particular defect and the defect still exists.

The bug database is great in theory, but in practice, it’s often an aging garbage heap of ignored customer frustrations and a /dev/null for opportunities for improvement.