I’ve been working on production-quality node.js web applications for a couple of years now, and I thought it’d be worth writing down some of the more interesting tricks that I’ve learned along the way.

I’m mostly going to talk about maintaining a low-defect rate and high availability, rather than get into the details about scaling that are covered in a lot of other places. In particular, I’ll be talking about load-balancing, process management, logging, and metrics, and the how’s and why’s of each.

Balance the Load

I’m going to assume that you’re already load-balancing on a given server with cluster or some higher level abstraction ( I use cluster-master) as well as between servers with a load-balancer like ha-proxy.

Performance considerations aside, your service will have much better availability, quality, and uptime if you’ve got multiple processes running on multiple machines. More specifically, you get:

  • the ability to immediately failover in the case of single process or single machine failure
  • reduced overall service failure in the case of single process or single machine failure
  • the ability to gracefully deploy with the load-balancer

Gracefully Deploy

Gracefully deploying means that your deploy process has enough servers running to handle the load at all times throughout the actual deployment process, and that none of the servers are taken off-line while outstanding requests are still in progress. This means:

  • Your clustering solution must be able to stop new connections and not exit the master process until all servers have finished processing their existing requests. The solution I personally use is cluster-master, but there are bunch of suitable alternatives.

  • You need a rolling deployment, meaning that servers are restarted one-at-a-time, or in small groups, rather than all at once. The easiest way to do this is probably to write a nice deploy script that takes restarting servers out of the load-balancer until they’re running again.

If you don’t have a graceful deploy solution, every deployment of new code will lose requests in progress, and your users will have a terrible experience.

Also note: I’ve seen a few clustering solutions that use some sort of hot-deploy (hot-deploy loads a new version of code without taking the server down) functionality. If you’ve got a rolling deploy via your load balancer though, you probably don’t need any sort of hot-deploy functionality. I’d personally avoid solutions that involve the complexity of hot-deploying.

Run as a Service

You’re also going to want to be running your app as a service that the OS knows to restart with some service manager like upstart. A service manager like this is going to be absolutely essentially for when your node.js app crashes, or when you spin up new machines.

It’s probably worth noting that you won’t really want to use something like forever or nodemon in production, because it doesn’t survive reboots, and is pretty redundant once you’ve added service management that actually does (This is a case where you don’t want redundancy, because these types of process managers can end up fighting with each other to restart the app, thus never really allowing the app to start).

Log to Standard Output

Logging to standard output (using console.log()) and standard error (using console.error()) is the simplest and most powerful way to log. Here’s how:

pipe it, don’t write it

In the config file for running your app, you want something like this to specify a log file:

1
node server.js >> /var/log/myserver.log 2>&1

The >> tells your node process to append output to the specified log file and the 2>&1 tells it that both standard out and standard error should go to that same log file. You don’t want to be writing to the logs programmatically from within the node process, because you will miss any output that you don’t specifically log, like standard error output from node.js itself which happens anytime that your server crashes. That kind of information is too critical to miss.

console is the only “logging library” you need

With this kind of logging set up, I just have to console.log() for any debug output that I need (usually just temporarily for a specific issue that I’m trying to solve), or console.error() for any errors I encounter.

Additionally, one of the first things I do on a new web service is to set up a console.log() for each request (this should work in any express or connect app):

1
2
3
4
5
  app.use(function(req, res, next){
    res.on('finish', function(evt){
      console.log(req.method, req.url, res.statusCode);
    });
  });

This chunk of code gives me nice simple logs for every request that look like this:

1
2
3
4
5
6
7
...
POST /api/session 400
POST /api/session 401
POST /api/session 200
GET /api/status 200
GET /api/status 200
...
rotating the logs

The missing infrastructure needed to support this is a way to rotate the logs, like logrotate. Once that’s set up properly, your logs will rotate for you nicely and not fill up your disk on you.

Tools to Help Detect Problems at Runtime

There are two basic key ways that I like to instrument an application to detect problems that occur at runtime: Aggregated logging and metrics.

Agreggated Logging

One of the most important things you can do for error and defect detection is to have aggregated logging — some service that brings all your web servers’ logs together into one large searchable log. There are a few products for this: The stand-out open source one for me seemed to be the logstash/kibana combination, though these days I’m lazy and generally use the papertrail service instead.

I would highly recommend that you set a service like this up immediately, because the small amount of friction involved in sshing into servers to tail logs is enough to seriously reduce how often you and your teammates will actually look at the logs. The sooner you set this up, the sooner you can benefit from being able to learn about your application through the logs that you have it write.

Metrics

When I say “metrics” I really mean time-series metrics, that allow me to see the frequency of different types of events over time. These are invaluable because they

  • tell you when something unusual is happening
  • aggregate certain types of data in ways that logs can’t
  • help you rally the team or company around specific high-value goals (be careful with this one!)

The stand-out metrics/graphic open source product is probably graphite. I’ve generally been using Librato’s metrics service though because it’s easy to set up, and looks great, so that’s where I’ll pull my screenshots from for time-series data. I’ve also had a pretty good experience with DataDog’s service as well. Both also come with the ability to raise alerts when a metric surpasses a threshold, which can be a nice way to know when something is going on that you should investigate.

Basic Metrics

There are a bunch of basic metrics that you can track to see what’s going on in your server.

Here’s an example of some very high-level metrics for us over a week:

  • in blue: number of requests
  • in green: average duration of requests

(Note that at this resolution, the y-values are averages over such long durations that the values aren’t really that useful at face value; it’s the visualization of general trends that is useful.)

There are a few obvious things that should jump out right away in this example:

  • We have daily traffic spikes with 5 peaks really standing out and 2 being almost flat (those happen to be Saturday and Sunday).
  • We had a spike in average request duration (on Friday morning — the third peak). This was caused by some performance issues with our database, and resulted in service outage for a number of our users during that time.

I can basically put metrics.increment("someEventName"); anywhere in my codebase to tell my metrics service when a particular event occurred.

Also consider OS metrics like: * disk space usage * cpu utilization * memory consumption * etc, etc

I’ve got my codebase set up so that metrics.gauge("someMetricName", value); will allow me to graph specific values like these over time as well.

If you’re not already doing monitoring like this, you must be wondering what your servers would tell you. When it’s this easy, you tend to do it all the time and get all kinds of interesting metrics including more business-specific metrics.

What next?

These are really just the basics. If you’re not doing these things, and you care about running a production-quality web service, you’re probably putting yourself at a disadvantage.

There is a lot more that you can do though, and I’ll get into that in my next post, which will be Part II to this one.

I’ve been reading the 37 signals book Remote lately and it’s got me thinking a lot about the advantages and disadvantages of working remotely. While they clearly acknowledge that there are trade-offs to working remotely, I personally don’t think they paint a very clear picture of just how expensive those trade-offs are. I can only really speak from a software development perspective, but I think this might apply to other types of businesses where a high degree of collaboration is necessary and the concept of “team” is more than just “a group of people”.

After 7 or 8 years of being a home-based contract-worker, and close to the same time spent working in various office environments, I’ve found that with the right people and the right environment, the office can be a lot more productive (and actually enjoyable) than a home environment, for a number of reasons.

Skype / Google Hangouts / Webex pale in comparison to actually being in the same room as someone else. I can’t explain why, because it seems to me like it should be good enough (and I really wish it were sometimes), but the fact of the matter is that it’s not. You miss nuances in communication that can add up to big differences in productivity. I’ve been in hours and hours of pair programming sessions over webex: it doesn’t match the high-bandwidth of shoulder-to-shoulder pairing. I’ve done the shared whiteboard thing: it doesn’t match the speed and simplicity of an actual whiteboard.

With communication over today’s technological mediums, there’s just a natural tendency to end the interaction as soon as possible “and get back to work” too, where people don’t actually leave Skype / Google hangouts / webex running all day long. The result is that people are reluctant to start new conversations like that because they’re worried about interrupting the other person. So people naturally fall back on less intrusive means of communication (text chat) and often even further back to more asynchronous forms of communication (email).

And asynchronous communication is not the boon to productivity that people think it is. Email is today’s definitive means of asynchronous communication, but I think it’s pretty obvious that there are few methods of communication that are less efficient. Imagine a sports team that can only communicate by email. That would be ridiculous. Email is amazing at its ability to cut down on interruptions of course, but it’s at the obvious cost of timeliness and immediacy (not to mention the very real human aspects including general tone and emotional nuances!).

The ideal solution would be to solve the problem of interruptions without losing timeliness.

Explaining this really requires actually defining an “interruption”: When you’re on a team and someone wants to talk to you about the main goal of the team, that’s decidedly NOT an interruption. Imagine a quarterback being annoyed by the “interruption” of someone yelling that they’re open for a pass. Imagine a soldier in a firefight being annoyed at another soldier for requesting cover fire prior to some forward manoeuvring. In order for something to really be an interruption, you have to be working on a different priority than the other person, one that you think is of higher priority.

In a collaborative environment, interruptions should often be viewed as a symptom, and not the actual cause, of the problem. To solve the problem of interruptions at the root, you’ve got to clearly define the priorities of the team, aligning everyone on those, and concentrating as many people as possible on the top goal rather than going the easy route (management-wise) and doing something like giving each of the X members of the team one of the top X priorities, effectively “team-multi-tasking”. Team-multi-tasking is a common approach because:

  • It’s the easiest thing to do management-wise (Why bother prioritizing when I can have X tasks worked on at once?).

  • It feels like you’re getting more done, because so many things are in progress at once.

  • It ensures everyone is 100% utilized.

But it’s also pretty obviously the absolute slowest way to get your top priority shipped and to start getting value from it (and often the slowest way to get them all done, believe it or not!).

Not only that, but the more you do this, the more people tend to specialize into roles like “the back-end guy” or the “ops guy”, etc, and individuals lose the ability to be cross-functional, and to practice collective code ownership. It’s a vicious cycle too: the more an individual tends toward deep specialization, the more we’re tempted to give them that kind of work because they’re the best at it. Not only does your bus factor skyrocket, but you get back into these scenarios where anytime someone engages someone else for help, it’s an interruption to that other person, so people tend to not ask for help when they really should, or they use dog-slow asynchronous methods (like email). Breaking this cycle means constantly trying to break down these specialty silos for any given specialty by having more experienced people collaborate with (and often mentor) less experienced people. The end result is a team that makes full-stack engineers and not just one that hires them.

I find that the right mindset for the team is to create an environment that would be best for a team working on only 1 super-important thing that you want to start getting value from as soon as possible. A general rule of thumb is: If you’re having a text conversation with someone in the same room, you’re doing it wrong. I know that’s common, but if you think about it, it’s pretty absurd.

How would that environment look? It would be a “warroom” for radically collocating the team. Everyone that needs to be there would be there, arranged in a manner that’s most effective for verbal communication. There would be nearby whiteboards for collaborative design sessions, and information radiators to show the progress of the efforts and various other metrics of the current state of affairs by just glancing up.

How would the team behave? They would be a “tiger team”. They would all be helping to get that one thing out the door. They would almost never use any electronic means to communicate (except obviously, when it’s more efficient, like sharing chunks of code, or urls), and you’d never hear someone say “that’s not my job”. If someone is the only person that knows how to do something, the team identifies them as a process choke-point and quickly tries to get others up to speed on that skill or responsibility (and not by giving them links to docs or occasional hints, but by super-effective methods like pair programming, in-person demonstration, and actual verbal discussion). If one member of the team appears to be stuck, he asks for help, and if he doesn’t, the other members notice, and jump in to help, unprompted. There are no heroes, and everyone takes responsibility for everything that the team is tasked with. This can and should be taken to extremes too: members should drop their egos and make themselves available to do grunt work or testing for other members — whatever gets the top priority shipped as soon as reasonably (and sustainably) possible. This includes making yourself vulnerable enough to ask for help from others, not only for tricky aspects of your work, but also just to divide your work up better. If you’re taking longer than expected on a given task, you should be conscious enough of that to be openly discussing it with the team, including possible solutions, and not trying to be a hero or a martyr. Conversely, you should never be waiting for extended periods of time for your teammate to do some work that you depend on for something else. If a teammate is taking longer than usual, jump in and help.

How would management work then? The team should self-organize. With clear priorities set, the team can and should, for the most part, self-direct. A manager should not try to manage the avalanche of interactions that happen during free-collaboration throughout the day. Trying to manage who works on what, will simply make that manager a choke-point and slow the process down (and he/she will inevitably be wrong more than right, compared to the wisdom of the entire team). Often a non-technical manager can have the advantage over more technical managers, because he’s forced to trust the team’s decisions and doesn’t become an “approval choke-point” for the team’s engineering decisions.

That doesn’t mean a management role is unnecessary though; it’s actually quite demanding to manage a team like this. Priorities must always be crystal clear and for the most part, the top few priorities being worked on should not change very often. If they do change often, you usually have either a management problem or a product quality problem. Those problems should be fixed at the root as well, rather than making developers feel like management or customers are just a constant stream of interruptions. Keep in mind that if you do change the top priority often enough, you can completely prevent the team from any progress at all (I’ve spent months on a team like that before — it’s not fun for anyone). (For a more complete description of the ideal responsibilities for this style of management, see “The Management” here )

Does this work for all types of people? No. But then again, you’ll never get a fast team of developers without hiring the right people. If you have people on the team that are incapable of being team-players, it’s certainly not going to work (and you’ll likely have a number of problems).

What if we have TWO top priorities though? Flip a coin and pick one; It’s that simple. Just arbitrarily choosing one will get one of them shipped and returning value as soon as possible.

What about team member X that has nothing to do? The team should recognize this and:

  • try to break down the tasks better so they can help (it’s often worth a 30 minute session to bring other people on board — Any given task can almost always be broken down further with additional design discussion, even if its not worth dividing up the work).

  • try to get them pair-programming so they can learn to help better.

  • try to get them testing the parts that are done.

Obviously in some scenarios it might make sense for them to start on the second priority, but they should be ready to drop that work at any time to help get priority #1 out the door faster. Remember: the goal is to ship finished features faster. It’s not to keep the people on the team as busy as possible.

Isn’t that really tough on the developers to rarely have quiet time to concentrate? Yes, at first it often is, often because focusing on the top priority requires discipline because it’s different from the status quo. And people simply aren’t used to the buzz in the room. The longer you spend working with headphones on, trying to carve out your own niche, separate from the rest of the team, the longer it will take you to transition and the harder that transition will be. But soon you learn to ignore the irrelevant buzz in the room and to tune back in quickly when it’s important and relevant. And since you’re all working on the same thing, it’s often relevant, and lot less difficult to get back into the flow state than you’d expect (especially if pair programming). So the pay-off for developers is:

  • huge productivity improvements

  • less waiting for someone else to do their part

  • longer sessions in “flow state” and easier returns to flow state

  • less solitude without adding interruptions

  • less “all the weight is on my shoulders” scenarios

  • more knowledge sharing and personal development

It helps a lot if developers take breaks more often, because you actually end up spending a lot more time in the flow state in a given day. You end up learning more from other developers and being vastly more productive as well, which is exhausting and energizing at the same time. In my opinion, when done right, it’s just a lot more fun than isolated development.

Of course it often also helps to have break-out rooms for smaller prolonged conversation when desired as well.

But what if you have dozens of engineers? It’s just not reasonable to try to make teams that contain dozens of engineers. Brooks’ law has seemed to hold fairly well over time and he explains that one reason is that the more people you add to a group, the more communication overhead there is. I’ve personally seen this as well, with team-size starting to get diminishing returns between 8 and 12 people, with little value (if any) to adding members beyond that. When you get beyond 8 people on a given team, you need to start thinking about creating a second team. Conway’s Law seems to dictate that a service-oriented architecture is the best solution we’re going to get for scaling up an organization’s developer head-count, but I’ve personally found that smaller codebases with distinct responsibilities are generally better codebases anyway.

With all that said, of course I really wish I could work remotely sometimes and get all the same benefits. I’ve simply never seen a telepresence technology set-up that matches the fidelity of actual face-to-face and shoulder-to-shoulder collaboration. Maybe it exists. I’d love to hear about it. But my experience simply doesn’t indicate that the current status quo technologies (skype, webex, etc), like Remote recommends, are up to snuff. The way it stands today, I’ll almost always put my money on a collocated group of “team-players” over a geographically disparate team of “gurus”.

So you’re thinking about versioning your web API somehow. Versioning is how we annotate and manage changes in software libraries, so it seems natural to re-apply this solution to changes in web APIs. Most web APIs are versioned in order to…

( A ) “freeze” an older version somehow so that it doesn’t change for older clients, or at least it is subjected to much less change than newer versions, so you can still move forward.

( B ) be able to deprecate older APIs that you no longer wish to support.

( C ) indicate that changes have occurred and represent a large batch of changes at once when communicating with others.

However…

( A ) Versions don’t actually solve the problem of moving forward despite older clients.

You can safely break backward compatibility with or without the concept of versions: just create a new endpoint and do what you want there. It doesn’t need to have any impact on the old endpoint at all. You don’t need the concept of “versions” to do that.

( B ) Versions don’t actually solve the problem of deprecating old endpoints.

At best, versions just defer the problem, while forcing you to pay the price of maintaining two sets of endpoints until you actually work out a real change management strategy. If you have an endpoint that you want to retire, you’re going to have to communicate that to all your clients that you’re deprecating it, regardless of whether or not you’ve “versioned” it.

( C ) Versions DO help you communicate about changes in a bunch of endpoints at once. But you shouldn’t do this anyway.

Of course, if you’re retiring 20 endpoints at once, it’s easier to say “we’re retiring version 1” than to say “we’re retiring the following endpoints…” (and then list all 20), but why would you want to drop 20 breaking-change notices on your clients all at once? Tell your clients as breaking changes come up, and tell them what sort of grace period they get on those changes. This gives you fine-grained control over your entire API, and you don’t need to think about a new version for the entire API every time you have a breaking change. Your users can deal with changes as they occur, and not be surprised with many changes at once (your new version).

“So how the @#$% do I manage change in my API??”

With a little imagination, it’s totally possible. Facebook is one of the largest API providers in the world, and it doesn’t version its API.

Take Preventative measures: Spend more time on design, prototyping, and beta-testing your API before you release it. If you just throw together the quickest thing that could possibly work, you’re going to be paying the price for that for a long time to come. An API is a user interface, and as such it needs to actually be carefully considered.

Consider NOT making non-essential backwards-incompatible changes: If there are changes you’d like to make in order to make your API more aesthetically-pleasing, and you can’t do so in an additive fashion, try to resist those changes entirely. A good public-facing API should value backward compatibility very highly.

Make additive changes when possible: If you want to add a new property to some resource, just add it. Old clients will just ignore it. If you want to make breaking changes to some resource, consider just creating a new resource.

Use sensible defaults where possible: If you want a client to send a new property, make it optional, and give it a sensible default.

Keep a changelog, roadmap and change-schedule for your API: If you want to solve the problem of keeping your users abreast of changes in your API, you need to publish those changes somehow.

Use hypermedia: Tell the clients what uris to use for the requests that it needs to make and give them uri templates instead of forcing them to make their own uris. This will make your clients much more resilient to change because it frees you up to change uris however you want. Clients making their own uris are tomorrow’s version of parsing json with regular expressions.

NB!

Notice that almost all of these are things that you should do regardless of whether or not you version your API, so they’re not really even extra work.

Notice also that I never said “Never make changes”. I’m not trying to make an argument against changes. I’m just saying that versioning doesn’t help you make changes at all. At best it just lets you put a bunch of changes in a “bucket” to dump on your API users all at once.

If you’ve ever seen a laundry list of the types of non-functional requirements like testability, accessibility, reliability, etc, (often called “ilities” because of their common suffix) you probably haven’t seen “rewritability”, but I really want to add it to the pile.

I’ve often found in software engineering that counterintuitively, if something is difficult or scary there’s a lot to be gained by doing it more often. I think we can probably agree that software rewrites are amoung those difficult and scary tasks.

Now that we’re starting to get better at service-oriented architecture, where networked applications are better divided into disparate services that individually have narrow responsibilities, a natural side effect is that it’s getting easier to write services that are replaceable later.

So what makes a piece of software rewritable?

Clearly defined requirements: not just what it needs to do, but also what it won’t do, so that feature-creep creeps into another more appropriate service. This one ends up requiring some discipline when the best place and the easiest place for a given responsibility don’t end up being the same service, but it’s almost always worth it.

Clearly defined interface: If you want to be able to just drop a new replacement in place of a previous one, it’s got to support all the integration points of the previous one. When interfaces are large and complicated this gets a lot harder and a lot riskier. In my opinion, this makes a lot of applications with GUIs very difficult to rewrite from scratch without anyone noticing (Maybe it’s because UX is hard. Maybe that’s because I’m terrible at GUIs.).

Small: Even with clearly defined requirements and interface, a rewrite is still going to be expensive and hard to estimate if there’s a lot to rewrite.

Any software engineers that have found themselves locked into a monolithic ball of mud application can see the obvious upsides of having a viable escape route though.

And once we’re not scared of rewrites, what new possibilities emerge? Real experimentation? Throw-away prototypes that we actually get to throw away?

How much easier is it to reason about a piece of software when it also meets all the criteria that make it easy to rewrite?

I once asked an interviewee “Ever work on a high-performing team?” and he answered “Yeah…” and then went on to explain how his team profiled and optimized their code-base all the time to give it very high performance.

That’s not what I meant, but in his defence, I really don’t think many developers have had the opportunity to work on high-performing teams and so they don’t know what they don’t know.

I’ve been developing software professionally for around 15 years and I’ve only worked on a high-performing team once. It was a couple of years ago, and it was awesome. I haven’t been able to find another since, but now I’m a true-believer in the concept of a 10X team.

I suspect that there might be groups out there somewhere who have experience with even higher-performing teams, but I’m certain that I’ve been on enough teams now that high-performing teams are extremely few and far-between. I’m not sure I even know how to create one, but I know one when I see one. I’ve definitely been on teams since that have been close to being high-performing.

The high-performing team is the ultimate win-win in the software development industry because the developers on the team are thrilled to be going into work on a daily basis and the business is thrilled because they’re producing results much faster and more reliably than other teams. Those are really the two essential criteria, in my opinion:

  1. Is everybody happy to be part of the team?
  2. Is the team producing results quickly? The first criteria is absolutely necessary for the second to be sustainable. Any business that neglects developer enjoyment is taking a short-sighted path that will eventually negatively impact performance (due to burnout, employee turnover, etc).

So all of this sounds great, but the real mystery is in how to make it happen. I’m not 100% sure that I know. I don’t think I’ve ever even heard of people talking about how to achieve it. This is my shot at describing the recipe though.

The Management

The craziest part of my experience is that the only time I’ve seen a high performance team happen was when the manager of the team had very little software development experience. That seems maddeningly counter-intuitive to me because from a developer’s perspective, one of the most frustrating aspects of software development is having a clueless person telling you what to do and how to do it. It turns out that being clueless in software development can be a huge benefit to being brilliant at software development management though, if the manager is already brilliant with management in general. This manager was definitely brilliant on the management side. Here’s why:

  • He hired for attitude over aptitude. He would hire based on peoples’ potential, their attitudes, and their willingness to work hard instead of whether their precise skillset filled a niche that we needed.
  • He would cull developers who weren’t team players. It’s not a job anyone wants to do, but it’s sometimes necessary to let people go when they’re not working out. Not doing this results in the rest of the team being miserable.
  • He kept the team small, and didn’t move people in or out of it very often. This gives the team the maximum opportunity to form into a cohesive unit.
  • He would set clear goals and priorities. No two things would share the same priority (even though I’m sure he was probably secretly flipping a coin when more than one thing was extremely important). That would give the team a clear direction and an ability to focus on what’s most important.
  • He never explained a requirement with “because I said so”. This encouraged a culture of questions and feedback which enhanced learning both on the manager’s part and on the team’s part.
  • Even when priorities would change, he’d let us complete our current work where possible, so that we wouldn’t have a mess of incomplete work lying around to try to come back to later. I’ve been on teams since where the team has been repeatedly interrupted to the point of not really delivering anything for months.
  • He assigned work to the team, and not to individuals, so the team would have to own it collectively. This kept the manager from being the choke-point for “what should I work on next?” questions.
  • He didn’t make decisions or solve problems for the team that the team was able to handle themselves (even when individuals on the team would ask).
  • He’d bring us the requirements, but leave the solutioning up to us. The result was that the team was largely autonomous and in control of how it accomplished its goals, so we immediately had a newfound sense of responsibility for our work and even pride in it.
  • He’d get estimates from us, and not the reverse (ie. deadlines). This way he could inform the business about what the estimated cost and schedule for a given feature was so they could make informed decisions about what features to pursue, rather than just delivering what was finished when the clock ran out.
  • He’d actively look for scenarios where the team was not “set up for success” and rectify them. Nothing is worse for team morale than a death march project, even a mini-one.
  • He held the team responsible for the performance of individuals. If someone on the team was struggling on a task (daily stand-up meetings make this pretty obvious) without any help from team, he’d ask the team what they were doing to help.
  • All of these things created the perfect climate for a high-performing team, but no one person alone (even a great manager) can create a high-performing team on their own.

The Team Member

The individuals on the team were also really impressive to me and no doubt absolutely essential to making the team what it was. Here’s a loose laundry-list of the qualities I witnessed and tried to adhere to myself:

  • Members of the team were open, honest, and respectful. They would give each other feedback respectfully and truthfully and would never talk behind each others’ backs. On other teams that I’ve worked on where this wasn’t the practice, team-crushing sub-cliques would form.
  • Once a decision has been made by the team, every member of the team would support it (even if he/she was originally opposed) until new evidence is presented. This actually worked well because we would always first take the time to actually listen to and consider everyone’s opinions and ideas.
  • Members of the team were results-oriented. “Effort” was never involved in our measure of progress. Software actually delivered according to the team’s standards was our only measure of progress.
  • The team would continuously evaluate itself and its practices and try to continuously improve them. We’d have an hour-long meeting every week to talk about how we were doing, and how we could improve (and how we were doing implementing the previous week’s improvements).
  • The team would practice collective code ownership, and collective responsibility over all the team’s responsibilities. This is actually a pretty huge deal; if people working on a top priority needed help, they’d ask for it immediately and anyone that could help would. If someone was sick, someone else would take on their current responsibilities automatically. If a build was broken, everyone would drop everything until there was a clear plan to fix it. The morning stand-up meeting would happen at exactly 9 am regardless of whether the manager was there.
  • The team would foster cross-functionality as a means of achieving collective code ownership and collective responsibility. What that meant was that members would actively try to teach each other about any particular unique skill-sets they had so that they would be redundant and team members would be better able to take on any task in the work queue.
  • The team would be as collaborative as possible. We all sat in a bullpen-like environment where we’d torn down the dividers that were originally erected between us. We’d often pair-program, not only for skill-sharing and mentoring, but also for normal coding tasks, because the boring tasks are less mind-numbing that way and the complicated tasks get better solutions that way.
  • In the 5 or 6 teams in our organization at the time, I think our team was by far the fastest, and most capable of handling the more technically diverse and challenging work. No one on the team was a rock-star programmer, but somehow despite that, the team as a whole was really outstanding.

Eventually when this manager moved on, I took the management role and ultimately failed to maintain that environment (partially for a number of reasons that were out of my control, and partially because at the time I didn’t realize the necessary ingredients). I’ve only really learned them since then by watching dysfunctional teams.

It’s a cliche, but in the right environment, the collective can indeed be greater than the sum of its parts.

EDIT (June 2nd, 2014):

I just moved my blog over from wordpress, but I don’t want to lose the comments that I had there, because they’re great and they’re from the team that I was writing about. I’m just going to copy/paste them in here for posterity.


Todd said on May 8, 2013 at 8:12 am: Edit

You nailed it! I’m glad my simple ‘I miss working with you…’ email triggered you to write this thoughtful post. I admire your choice to spend time sharing your thoughts with the world.

I was trying to come up with some clever addition to your post but I keep typing and erasing. The one constant thought I whole heatedly believe had the most impact on our teams success is ‘Jaret’. I sure hope he doesn’t read this post, they’ll be no living with him! :)


Tom said on May 10, 2013 at 7:50 pm: Edit

Gregg and Todd, I also miss working with you guys. I do, however, have some pertinent hymns to add to the post. Though, I must first agree that a highly productive team is not formed from a cookie-cutter, but many ingredients do add to the potential success; kudos to Gregg for identifying most.

Management: I’d add that the “ideal” manager is an advocate for the team’s decisions, approach and methodology even if he does not fully understand them. Specifically, if the team can convince, through reason, that they were not smoking crack and that their solution would be suitable and economical (perhaps not in the short-run, but in the long-run), then he would go to bat for them. He’d do this no matter how much counter-pressure was exerted by upper management; clearly JW did this.

Secondly, a culture of experimentation, collaboration, and trust must trickle down from the upper echelons of management down to the lowest levels. Think of this as an approach to lowering entropy. Without this, all middle managers have their hands tied.

Team: Diversity in cultural and technological background is highly advantageous. The simple reason is that varied backgrounds allow differing opinions, analysis and approaches to a solution. I’d be bold enough to state that a homogeneous team is useless at being highly-productive because there is not enough diversity in the “knowledge gene pool” to forge great solutions. On the flip side, too much diversity may be a hindrance as agreement may be hard to come by for the final solution, but somebody should play devils advocate during the initial discussions; this only comes from varied experiences.

As well, team members should feel as if they are creating something of value. Perhaps this is more of a company or management issue, but I believe if people, regardless of their position in a company, feel like they are contributing to a greater good will naturally help their teammates and this will improve the productivity of the team as well as the firm’s bottom line.

I’m sure there are more comments to add, but I need to go have another glass of wine and reminisce :)


Gregg Caines said on May 11, 2013 at 8:12 am: Edit

Ha Good to hear from you Sykora :) I agree with your points. I think JW was set up for success when the “upper echelons” were supportive, and we were all destined for failure once those winds changed and upper management suddenly cared very deeply about the team’s every move on a day-to-day basis. Something makes me think that even if they had a reasonable ratio of right/wrong in their technical decisions, we’d still be irritated to be completely out of control of our own destinies, especially given our history of self-organization. Once the team gets a taste of autonomy, it’s hard for them to go back to command-and-control management. I suspect you of all people would agree. ;)

I agree with you on diversity too. We had a pretty diverse team but I think it worked well because we were respectful and trusted each other. I’ve seen teams since that were much less diverse and would argue disrespectfully about tiny things compared to some of the things that we would (calmly) discuss and (quickly) resolve. I’ll tell you one thing that I know for certain now: egos don’t make good team-mates. ;)

I agree with the “creating something of value” point as well, but I’m less about the value to the customer and more about the engineering craftsmanship. It’s the only way I’ve found myself able to reconcile product decisions I can’t agree with. Maybe one day I’ll have my greasy hands in the product side as well and get a chance to care all the way through though. ;)


tibi said on May 11, 2013 at 9:11 am: Edit

Well that was a nice post. And now I miss the team even more. I think you hit it right on the head (I can’t correctly remember the nail euphemism :–) )…

I’ve been looking for the same dynamics in a team ever since, and I think I’m getting close, but there is always something missing. Perhaps it’s the Tim Horton’s cup arch…


Jeff said on May 11, 2013 at 3:53 pm: Edit

Some great thoughts here Gregg – on the money pretty much across the board. A team is definitely only as good as it’s leadership allows. I’ve been in situations where the core team was incredible and worked together well but ultimately failed for the various reasons you any Tom both mention.

p.s. Hiya boys – hope everyone’s doing well. Did someone say Tim’s?


Gregg Caines said on May 13, 2013 at 9:01 am: Edit

Tibz and Jonesy! Good to see you two are still alive. :)


Chris said on May 14, 2013 at 7:06 am: Edit

It was something special to watch and interact with.

You do not really know what you’ve got until it’s gone, eh? There was a sense of what was happening but probably also a sense that it would fairly easily be recreated.

My current co-workers are tired of hearing how “we would have did it at my last place.” ;) Somehow it is not fair to hold other teams to that yardstick.


JW said on June 10, 2013 at 8:56 am: Edit

So many great things have been said above – and I have to agree with all of it. From my perspective, it really was a dream team. The trust, collaboration and commitment levels in the entire office were exceptional.

I have read numerous books on leadership and when I reflect on that team, we had all the characteristics of a high performing team.

For me, I loved the level of achievement, craftsmanship, and fun that we had. It wasn’t work – even a few of those late nights weren’t too bad because we were doing it together.

Is there a secret sauce to recreate that environment? Not that I have found. However, I have been lucky enough to be part of 4 different high performing teams over my career, and I believe there are things we can all do to help create an environment that could support a high performing team. From there – it is up to the team to choose to be high performing.

I identify well with the Patrick Lencioni’s book “The 5 Dysfunctions of Team”. Although I will not rewrite each of the characteristics from the book, I feel that we need to provide special mention to trust. It is the foundation of a high performing team. Nothing can be built without it. That means we need to know our team mates and their goals. How can we help each other? The more we help and recognize each other, the quicker the team will want to collaborate and achieve greater things.

We are all lucky to have been part of something great. I learned a ton from everyone there and I share those insights in my classes everyday.

Gregg – many thanks for starting this thread.


First a little background

I’ve always been a big fan of REST the way Roy intended it for two major reasons:

It uses HTTP the way HTTP was intended, playing on its strengths and not wasting any time re-inventing what HTTP already has. The state of HTTP APIs these days is really ridiculously bad, given what we should have learned from the unfathomable success of the WWW. I was also envious as hell at the great framework Erlang developers had in Web Machine . If you write HTTP APIs for a living and you haven’t seen HTTP mapped to a state machine, you really owe it to yourself to check it out. It’s brilliant.

The thing I really wanted to do though was make web APIs not so terrible. There are a few reasons that I think they’re terrible:

  • You have to read reams of documentation to do the simplest things, like finding the URL for something.
  • You have to hard-code urls all over your client app.
  • When you do something wrong, there’s no consistency in error output.
  • Knowing HTTP doesn’t make it any easier to predict how something will work.

If you rely on someone else’s API in your day-to-day work, you probably know what I mean. I’m not going to name any names (but here’s a clue: one of the worst offenders starts with an F and rhymes with Facebook).

So the APIs I wanted to make and to use would be:

  • semantically consistent with plain old HTTP
  • discoverable, like other pages and controls on normal HTML web sites are discoverable.

Luckily HTTP is also the vanguard of API discoverability, most often in the form of hypertext (So that’s what that H stands for!) . I’m a bit blown away that this wasn’t obvious to me earlier, but luckily I stumbled into Roy Fielding talking about hypermedia and I realized that that’s precisely the correct way to solve the discoverability problem. He was saying real REST APIs should minimize out-of-band knowledge by leaning on what HTTP’s already got: links! 1 .

Then I saw Jon Moore make it real with an XHTML API . XHTML forms are a method of discoverability as well?!? My mind was blown. Think about that: XHTML documents tell you what kind of write-possibilities the API has. You don’t need to read docs to find out!

And then I saw Mike Amundsen bring it all back to json and I knew it might be something I could actually use in the real world and people trying to Get Stuff Done wouldn’t want to hunt me down and murder me in my bed (especially since I’m also a card-carrying member of the People Who Want To Get Stuff Done Liberation Front). I think most of us are of the belief that XHTML isn’t going to unseat json as the preferred media-type for APIs anytime soon.

Armed with all this inspiration, I wrote a web-framework. I ended up picking node.js mostly because the HTTP line-protocol stuff was already covered there without any extra cruft to get in my way, so I was free to try to aim for “Web Machine with json with links”2, which I call Percolator. The framework isn’t the interesting part though; it’s the stuff I learned along the way, which can be done in any framework/language/platform, even if it’s not baked into the framework.

The First Interesting Part: Hypermedia apis are indeed awesome

I started by just putting links in my json to other endpoints that make sense. That made my json APIs so much easier to use, even when the user is me. I had one endpoint for my entire API and every other endpoint was discoverable by following links. I used JSONView for Chrome and for Firefox and I could click around my entire API testing the GET method on every endpoint just by clicking. Here’s how it looks:

Those purple links are visited links that you can actually click on and get a new API response. I showed some co-workers that were using my API and eventually they stopped asking “Hey what’s the URL for getting X again???” They could always easily re-find it themselves.

Adding links also pays off when humans aren’t involved. In many cases it means that the client doesn’t need to construct its own links. For example: when one of my APIs was being used for replication by a client app the client developer asked how we’d handle pagination, because he wanted to page through all my results to replicate my API’s data elsewhere (sort of like how couchdb replication works, which is simple yet amazing ) . I had a pagination scheme working, but it didn’t make use of the database indices as well as I wanted so I told him instead to just follow the ‘next’ link and not to worry about the parameters for constructing the URL at all and that the absence of a ‘next’ link would indicate the end of the results. What this meant for me is that when I changed the pagination scheme to something more friendly to my database, I didn’t even have to tell him. What it meant for him is that he never had to change anything on his side to support the new form of pagination, and he didn’t have to construct any urls.

It blows my mind that pushing this simple technology further could end up with clients ONLY needing to store the root URL of the API, and not a who’s-who of possible disparate endpoints that practically mirrors the routing table on the server. People are actually starting to push the envelope this way. Wouldn’t it be great if at some point we could stop writing client software that’s hopelessly servicedependent ? Wouldn’t it be great if one day client-side url construction was considered as silly as using regexes to get data from json?

I should also add that even if you’re writing both the client and the server, which happens a lot with single-page apps and mobile apps, you might be tempted to think this is a waste of time, but in my experience it makes testing and debugging much easier and helps me remember what my API does after a month without using it despite my terrible memory. That minor extra effort of adding links pays for itself over and over again.

Interesting Part #2: JsonSchema is surprisingly awesome

Of course once I could surf around my API, it didn’t take long for me to bemoan the fact that I could only peruse GETs this way. I had seen Jon Moore’s XHTML API demo (This is my third link to it here, so you know it’s worth a watch!) , and I was jealous of how XHTML forms could describe the POSTs as well. Originally I decided that json needed forms as well, but the big brains in the Hypermedia-Web Google Group convinced me that I could just expand my link concept to know about other HTTP methods. And I had already stumbled upon JsonSchema as a way to specify what was expected in a PUT or POST payload, just like XHTML forms can do (but still using json of course).

JsonSchema is a particularly elegant solution to the problem, because I can write all my validations in JsonSchema, publish the validations as json to the api (JsonSchema is just json afterall), and then on the client side re-use them for validations before I even bother with an http request to the api. This way I can use the same validations on the client and server, AND communicate them naturally as part of the json payload. The end result was that my json APIs could now fully communicate everything that was possible via the json API itself. Here’s how it looks:

I added baked-in JsonSchema support to Percolator right away of course, but you can do it easily in any of the most popular languages ( see the validators tab). Currently I haven’t implemented any clients that use it for validations yet, but I do use it for my server-side validations. More importantly still, it communicates to humans everything that’s necessary for a successful PUT or POST. 3

Ultimately the addition of JsonSchema lead me to add a static frontend to Percolator that, after abstracting it away, I published separately as the Hyper+JSON Browser . You don’t need it, but you can use it in conjunction with Percolator apps to ‘browse’ them even more easily than JSONView does (and without installing a plugin). Any app (using any language/framework) that spits out the same sort of output as Percolator could use this the Hyper+Json Browser just the same, because it’s a static single-page app that works over any API that works similarly. Here’s how it looks in use:

When you click on that ‘create’ link to POST, it does this:

Browser after clicking the POST link

That red json logo indicates that you haven’t entered valid json (yet). Pressing OK will actually POST the json that you enter and show you the result.

Here’s a view of another resource (an individual item in the list from the previous collection resource, which can be found by clicking that item’s ‘self’ link. It shows delete and update links for this particular resource.

People have likened this to the Django admin interface, which is a huge compliment, not only because the Django admin interface is pretty legendary, but also because this frontend doesn’t know a single thing about the API other than the requirement that the API uses the aforementioned link format. This is an area where I’m really excited: In what other ways can other programs use these APIs without foreknowledge? (I’m not really talking about intelligent agents – they’d need to understand a little more than the API provides — but dumber things that still use human interaction eventually, like automatic HTML form generation, are mostly possible.)

(Somewhat less) Interesting Part #3: Object-oriented routing leads to less stupid stuff

I hate to sound old-school, but sometimes the object-oriented paradigm maps to a problem better than a functional one. In this case, I’m talking about routing. I don’t know why the HTTP methods GET, POST, PUT, etc are named “methods” but it sure does make perfect sense, because each one is tied to a particular “resource”: that “thing” you’re interacting with by using the HTTP methods for a given URL. It should be obvious then that HTTP resources should be the thing we route to, and not disparate methods. REST APIs are supposed to be the antithesis of RPC, but when routing involves the method as well, developers tend to make less RESTful APIs and framework developers tend to mess up some of the finer points of HTTP. Here’s a quick test:

  • Grab your favorite web framework and create a resource with a single GET handler.
  • POST to it. Did you get the proper 405 response?
  • send an OPTIONS request. Did it tell you that only GET was available?

Probably not. Web-machine gets this right because it routes to the resource, instead of disparate functions that when imagined together form a resource. I wrote Percolator to work like web machine in this way, because I wanted my APIs to look like something other than a haphazard collection of RPC calls without requiring a lot of self-discipline. The result is that Percolator responds with 405s when a client tries a method that that resource doesn’t support, and also responds to OPTIONS requests with a list of all the allowed methods. The result again is better discoverability without reams of documentation, and better feedback for client developers when they’ve done something the API doesn’t support.

There are advantages to OO routes for application developers to though, organization-wise:

  • All routes for a particular resource are connected to that resource, so there’s no chance of sloppy intermingling in the source code.
  • A bunch of code that you have to write in each method can be written just once for the resource.
  • You start to see patterns in what resources do, which can lead to certain classes of resources that can be re-used over and over, sometimes wholesale and sometimes with particular strategies that change them slightly. This is a bit of a big deal, in my opinion because it’s extremely rare that I see this kind of re-use in web applications in the real world.

Anyway: Percolator is indeed ready for production use (at least for the use-cases for which I’ve used it in production!). It’s extremely fast and extensible, and I hate it a lot less than anything else I’ve used for json REST APIs.

Hopefully it helps someone make a nice API, or it gives some other framework developer some new ideas. The APIs we have today cannot possibly be the best that we can do.


  1. plus a pile of other conventions

  2. Ultimately I diverged quite a bit from Web Machine, but it was always a huge inspiration.

  3. Alright not everything, but a heck of a lot. Linked documentation on my RELs would be another huge boost to discoverability.

I’ve seen these anti-patterns in a lot of web APIs, so I thought I’d detail why they’re bad and what to do instead.

Using a 200 status code for every response:

This anti-pattern is common for SOAP-like RPC APIs. Using a 200 status code for all your responses is silly because you end up having to reinvent an error-indication in your response body instead, and nobody in the world knows your new standard (or wants to have to learn it).

It’s also extremely confusing in the cases that an error has occurred because a status code of 200 means everything is A-OK. There are already a bunch of error codes in the 400 and 500 ranges that are fully capable of indicating all sorts of common problems. It’s still great to put an error message in the response body, but there should be a 400 or 500-level status code so people know that an error even occurred and don’t go ahead and try to interpret the response body as normal or otherwise assume a success.

Using a 500 status code for all errors:

This anti-pattern is common because 500 is what most frameworks will do when a web application throws an uncaught exception, and so it becomes convenient to just throw exceptions and not worry about the repercussions. This is mildly better than using a 200 status code for literally everything, because at least it shows an error occurred but it still causes a lot of confusion for client developers.

When you 500 for an error (even if your web framework did it for you), you’re signalling that the problem with the request was the fault of your application (the server) and it’s possible that retrying the request will be successful. Because 500 errors indicate an internal server error, they shouldn’t actually ever happen though1 — In general, you’ve got a bug in your application if they do, and the web server will tell everyone about it. The right thing to do, if there’s something wrong with the request is to respond with the appropriate 400-level status code.

Using a 500 error when a 400-level status code is more appropriate is obviously really confusing to the person making the requests because it doesn’t even let them know that their request needs to be fixed. Even if that person is also you, it forces you to start digging into your back-end for what could possibly just be a bad client request. If you don’t know which 400-level code to use, at the very least use 400 (Bad Request) itself. This will at least give your client developer an indication if the blame is on the server or the client and cut your debugging in half. A few words about the error in the response body can go a long way as well.

Logging 400-level codes at ‘error’ level:

Logging is pretty important to measuring the quality of implementation of an API, and learning about specific defects. The ‘error’ level in your logs should be a way to indicate “things that should not have happened that are within this application’s control”, so that you actually have a hope of attempting a 0-error policy. 400 level codes are errors of course, but they’re client errors, and not your server application’s error, so they should not be logged as errors along with your actual 500-level errors (Of course you might still log them as “info” level to keep track of what kinds of crazy things your clients are doing).

Why do all these details matter?

When you have an interface between two applications (the client and the server in this case) like HTTP, it REALLY simplifies things a lot if they communicate whether messages are acceptable or not (and why). Even if you control both ends of the interaction, you can save yourself a lot of server-side debugging by having the server indicate what exactly went wrong with a given request, instead of repeatedly reading stack traces every time some JSON you POSTed is missing a property. Compared to how little extra work there is involved in communicating the problem correctly, it’s pretty short-sighted to just let all exceptions bubble-up, or catch all exceptions silently (and 200).

Keep in mind too that if you’re now convinced that communicating errors is important, then HTTP already has a system for just that: HTTP status codes (plus whatever specifics you want to additionally communicate in the response body). Status codes are part of the greater HTTP standard that a lot of developers, libraries, proxies, and caches already understand. Re-inventing your own error mechanism is generally a waste of everyone’s time (with you at the top of the list).

Getting this right is a real time-saver over the long-haul even if you’re the only consumer of the API writing both client-side and server-side. The time-savings only multiplies as you add more clients.


  1. Sometimes 500 is appropriate of course: If your application requires a database and its lost connectivity to it, then a 500 makes total sense. It signals that there’s a problem in your application and the client can possibly try the request again later.

If you consider the number of packages on pypi, npm, and rubygems, and the release dates of python (1991), ruby (1995) and node.js (2009), it looks a lot like those communities are getting new packages at the following rates, per year:

python: 29,720 packages / 22 years = 1351 packages per year

ruby: 54,385 packages / 18 years = 3022 packages per year

node.js 26,966 packages / 4 years = 6742 packages per year

It’s important to note that I’m only showing the other languages as a measuring stick. There are probably a lot of reasons for the disparity here (including my imprecise math), and I’m not trying to say anything like “node.js > ruby > python” but obviously new node.js packages are being published to npm at a staggering rate. I just checked npm (on a Sunday night) and 20 new versions of packages have been published to npm in the last hour alone.

This is a pretty amazing feat for node.js. How is this possible?

Batteries NOT included

If you’ve needed an HTTP client in Python, you’ve probably asked yourself “should I use urllib, urllib2, or httplib for this?” like so many people before you. Well the answer is probably that you should use requests. It’s just a really simple, intuitive HTTP client library that wraps up a lot of weirdness and bugs from the standard libraries, but it’s not a standard library like the others.

The “Batteries included” philosophy of Python was definitely the right approach during the mid 90’s and one of the reasons that I loved Python so much; this was a time before modern package management, and before it was easy to find and install community-created libraries. Nowadays though I think it’s counter-productive. Developers in the community rarely want to bother trying to compete with the standard library, so people are less likely to try to write libraries that improve upon it.

Developers are less likely to use non-standard libraries in their projects too. I’ve heard many instance of people suffering through using an inferior standard library just to have “no dependencies”. But in this day and age of cheap massive storage and modern package management, there are very few reasons for a policy of “no dependencies”.

Conversely, the node.js core developers seem to actually want to minimize the standard library as much as possible. They have repeatedly removed features from the standard library to free them up to be implemented by the community instead. This allows for the most variety and lets the best implementation win (but only until someone makes a better one, of course!).

Imagine how liberating this is for standard library maintainers too. The node.js standard library is comparatively tiny, freeing the core developers to just deal with the core necessities.

The Tiny Module Aesthetic

Just like the 140 character limit on twitter makes people “blog” more, the node.js community has a culture of publishing tiny modules that makes people more comfortable with publishing smaller packages, and so it happens vastly more often.

While I don’t really consider myself “a node.js developer”, I’ve found myself publishing over a dozen packages in the last year myself, just because I’m working on a node.js project at work, and I publish any reusable package from that work that I create. My project ends up factored better, and I end up with a bunch of building blocks I can re-use elsewhere (and then I do!). Sometimes random people from the interwebs even fix my bugs before I notice them!

This approach is not unusual at all either, at least in the node.js community. The community is lead by developers that have each published literally hundreds of packages. I don’t even know how this is possible.

Of course there’s absolutely nothing to stop other languages from following suit, so it seems to me like community culture is the deciding factor.

Massive monolithic frameworks like Ruby on Rails tend to not take hold in the node.js community mindshare, I think due in part to this “tiny module aesthetic”, and it’s all the better for it. It seems to me that monolithic frameworks like Rails don’t have a clear notion of what does not belong in Rails, and so they tend to expand indefinitely to include everything an application could need. This leads to people trying to assert that Rails is the best solution for all problems, and having a cooling effect on innovation within the web application space in the Ruby community.

The tiny module aesthetic also leads to an ecosystem of extreme re-use. When a package doesn’t do more than is absolutely necessary, it’s very easy to understand and to integrate into other applications. Conversely, while massive monolithic frameworks tend to appear in a lot of end-user applications, they rarely never end up being dependencies in other packages themselves. It seems unfortunate to me that so much great software should be so difficult to re-use.

git and github.com monoculture

To be honest, few things are more boring to me in software development than version control, and I find the complexity of git to be a bit silly and mostly unnecessary for me, but I’ve never heard of a node.js developer that doesn’t use it, and there’s huge value in that kind of monoculture. It means that they all speak with a common language and there’s no impedance when you want to contribute to someone else’s projects.

Github.com has a similar effect in lowering the barriers for cross-pollination between developers. I’ve only very rarely seen a node.js project that wasn’t on github. This means I immediately know exactly where to find source code if I want to contribute.

The pluses and minuses of a github monoculture go far beyond this, but there’s probably enough fodder there for an entirely different post.

Permissive Licensing

Node.js packages tend to use extremely permissive licensing like MIT and BSD licensing. In fact, the default license for a package created by npm init is BSD. I think this is another sign of the times.

Very few people care anymore if someone else forks and doesn’t contribute back. There’s very little to gain in not contributing back anyway because of the cost of maintaining your own fork.

This is important because no one wants to have to deal with dependencies that make them think about legal repercussions, or give them extra responsibilities. Licenses like the GPL end up having a cooling effect on software-reuse for this exact reason.


EDIT Jun 3rd, 2014: Comments below are basically copy/pasted from my old blog format for posterity.


30 thoughts on “The node.js Community is Quietly Changing the Face of Open Source

  1. max ogden said on :

    Nice writeup! I do think that the term ‘node community’ is a little bit too general since some of the most popular node modules are in fact bigger than they should be.

    For example, the 4 most popular modules from https://npmjs.org/: underscore, async, request and express could all be split up into smaller pieces:

    Underscore isn’t terribly large but it has things tucked in it like a templating function that don’t belong.

    Async is thematically consistent, offering many variations on a theme, but from the empirical usage patterns that I’ve seen most people use a small percentage of the offered functionality.

    Request actually went through the modularization process over the last month or two and went from having 1 or 2 dependencies to having 11.

    Express is perhaps the greediest module on NPM (IMHO) in that it invented its own naive APIs early on which means that now you can only use modules designed for express with express since the extensibility is stuck in a relatively pre-cambrian era. Classic framework lock-in, albeit to a lesser extent than other, larger web frameworks. I’ve thought a lot about why this happened and believe that if you try to make a land-grab early on and abstract too much too early then you get stuck with your abstractions due to backwards compatibility. Good software grows slowly.

  2. Yeah I totally agree, and you’ve probably explained it even better than I did.

    Express is actually a bit of a conundrum for me for a few reasons, especially since in trying to create a mini API framework myself ( http://percolatorjs.com ), I’ve felt forced into duplicating some of the things I dislike about the modularity (or lack of modularity) in express, to the point that I eventually gave up and added ‘connect’ middleware support. There are a few competing pressures that make web frameworks like express (and rails for that matter) popular:

    (1) Some people just want to “get shit done” ™ and don’t want to have to search for multiple pieces of a puzzle to hobble together a solution. I think there’s probably a gap where finding the best modules for a given problem is still a too hard. Filling that gap (somehow) will promote collaboration even more. The current npm search (on its own) doesn’t solve the problem IMO.

    (2) HTTP is actually really complicated and very hard to abstract effectively. For example, how many frameworks 404 when you try an invalid method on an existing resource instead of the proper 405? This is mostly because of the way routing is abstracted; These frameworks use the http method as the first ‘key’ in the routing table, when really the url makes more sense as a key and the method should be secondary. There are a number of cross-cutting aspects like this including logging, error-handling, and authentication where some sort of centralized pre-processing makes a lot of sense. Not all “aspects” should be handled with serialized pre-processing middleware though, and unfortunately that’s what we’re seeing a proliferation of. I’m disappointed that I can’t build a mini-framework without integrating connect, or rewriting/repackaging all those otherwise great connect middleware modules out there, but I don’t know a better way. To be honest, I don’t even know at this point what the better abstraction would be. Doing everything in pre-processing middleware and monkey-patching the request/response objects doesn’t seem like the answer though.

  3. Good stuff. I haven’t felt this excited about a platform community in a long time.

  4. Good read: spot on.

    I would add a couple additional factors to that trend:

    - Proper package management: `npm` was and still is what got me hooked up. Package management is often an afterthought that is rarely properly addressed.

    - Flat community: which is partly illustrated with the extreme culture of re-use you mentioned. I feel there is much lower friction in creating and contributing to modules in the node.js community, with no gate keeper in sight.

  5. For a similar system: Perl’s CPAN (the package repository) has been online since 1995 and presently has 120,447 modules for a yearly rate of 6691 modules for 18 years now.

    It’s been providing the same functionality you mention: an encouraged license (“the same terms as Perl”), a monoculture of software (CPAN for package distribution/documentation), and an emphasis on tiny modules. Little packages are encouraged and really the way to go when you’re building Perl modules.

    That said, it’s great to see those practices being adopted successfully by Node and seeing the success of the groups using it. We live in very exciting times for open source. :)

  6. The “The Tiny Module Aesthetic” reminds me that, the emerging of the “Single-purpose app” in recent years, and the design philology of the Unix utilities. And I think those are all about “simplicity”, which I think and care about the more and more in recent years, but I always have to remind myself about when writing code for LIVEditor – my code editor with “Firebug” embedded.

  7. Mansur said on :

    There are some frameworks in Node.JS which are just copied, not unique, like for example https://github.com/biggora/caminte – is just a copy of Juggling DB https://github.com/1602/jugglingdb.
    I would strongly recommend to use genuine frameworks which have been tested by the community and better supported.

  8. Why don’t you consider Perl? 120,460 Perl modules in 15 years, that makes it 8030 modules per year and it’s increasing exponentially now.

  9. PeterisP said on :

    “Batteries not included” approach puts a significant cost on application developers.
    If I need to do a simple thing X, I’d rather have a single good library for X rather than twenty libraries with varying focus and quality. Even if one of those twenty libraries would be far better for my needs, the task of properly comparing and choosing the lib is often higher than doing X with a random library.

    It also reduces maintainability – if one project uses lib1 for X, and another project uses lib2 with a different API for the exact same thing, then it makes stuff harder to maintain.

  10. @ Chankey Pathak, Mark Smith: For the record, I didn’t include perl because I haven’t got any perl experience to draw from, while I’ve spent years writing reams of code in ruby, python and node.js. I can honestly say though that CPAN is nothing short of legendary even outside the perl community, and npmjs.org would do well to follow its lead. It would be interesting to know too how CPAN maintains that rate of growth, and if it’s due to the same things I’ve mentioned.

    I should also add that node.js isn’t even the language/platform that’s nearest and dearest to my heart. I’d really just love my other favorite languages’ ecosystems to pick up some of these qualities.

  11. Su-Shee said on :

    From a CPAN’s lover’s perspective NPM already is _very_ well done and pretty smooth to use.

    From a Perl lover’s perspective, JavaScript doesn’t have a choice anyways, because you really NEED a lot of modules to work around well let’s politely say “language quirks”. ;)

    You simply need things like underscore or node, because it gives you things which aren’t builtin in the language. (*cough* npm install printf *cough* ;)

    Similarily, Perl has gotten its Lisp-style MOP OO module Moose, because there’s no such thing as a “class” in Perl.

    Our equivalent of underscore-alike modules are for example List::Utils, Lists::MoreUtils etc. etc. – and you can’t possibly shove ALL those modules into what we call “core” – set aside that half of the people want Foo and the other half Bar and then there’s our “problem” we’re really proud of: Backwards compatibility is _amazing_ in Perl and there’s always someone who runs a 12 year old Perl and still needs his Really::Ancient::Module::BlahBlah. ;)

    So far I encountered exactly ONE npm module which I couldn’t install blindly with npm install -g – and that was due to an architecture check in its requirements (which isn’t necessary, module very high level) and my Linux flavor returns a different architecture string – so one lesson might be to encourage a best practise “don’t overdo requirements for your module, it just complicates things”

    In Perl, it’s common that you simply ask “the community” (IRC for example) “what do I want for XY?” and get a variety of 2-5 modules and a short explanation why you want A and not B. Imagine for example you have more than one XML/HTML parser module (which we do) – do you want XPath support or do you prefer DOM-style methods? Do you need a strict parser or do you want to work on dirty HTML?

    Most module authors mention these kinds of use cases and “flavors” to do things and refer graciously to the alternatives. Plus, there’s the entire commenting and rating system for modules and yes, you of course look if Perl gurus Tom, Dick and Harry Superperl ranks module Foo highly or not – it’s not that we don’t have our own little rockstars and divas. ;)

    Due to variety and so many choices, there’s also bad choices and we put A LOT of work into discouraging certain modules these days which are simply not really appropriate to use in 2013 or some well known badly written ones. But you can’t just throw them out, too – because Perl is installed everywhere – what is the browser for JavaScript is pretty much any Linux/Unix for Perl. And all the shitty JavaScript quirks you’ve gotten from browser vendors, we get from distributors which come with a Perl from 1999 in 2013.

    Sooner rather than later (there already are a couple) there will be a class of modules on NPM which are dual use – browser and Node-based server-side and you will have to keep that straight and compatible and that will be a lot of work. Also, what if people start writing JavaScript modules which are able to run in browser, on Node AND on e.g. spidermonkey? (what about Rhino/Nashorn?) I’d encourage that a lot, not everything needs Node and NPM is the de facto thing to put a module up on by now.

    Another advantage of CPAN is: It’s _extremely_ centralized. You look for a module on CPAN. Period. There’s no “download from here” or “fetch from there” – it’s on CPAN or it doesn’t exist. As CPAN is tested and tested and retested by volunteers, using modules from CPAN pretty much ensures been tested on a dozen platforms. If you use the plain “cpan Foo::Bar” installer, you can always watch all tests live while running the installation of a module.

    And Perl modules have tests LIKE THERE’S NO TOMORROW – as Perl’s testing flavor TAP is extremely easy and simply and cheaply to do, modules actually _do_ have a LOT of tests.

    Last but not least: On average, the commonly used Perl modules are _extremely_ well documented and the docs can be read on commandline and via web. The documentation style is probably rather old-school from a Node lover’s perspective – it’s more on the dry, informative side and less about being cute and funny – but some docs literally go into the amount of a book written.

    Also, no matter how big a project – I simply expect it to be installable via CPAN. There’s a couple of Perl things which are more applications but “modules” and are not installable via CPAN, but that’s pretty rare. Entire frameworks are installed via CPAN – think like the difference of how you install Derby.js and Meteor.js.

    And: I run deliberately large dependency-chains for the fun of it – CPAN has a flag for “resolve all dependencies yourself and say auto-yes to everything you’re asked during installation” and I manage a couple of hundred modules in ONE session WITHOUT any error while running ALL tests during installation – possibly on half a dozen Perl versions (Made with perlbrew which is our nvm/rvm/virtualenv).

    As the Perl mantra “there’s more than one way to do it” is done by the JavaScript community on the same level of excess (a dozen web frameworks anyone? ;) you’ve made your choice already anyways to go down the road of extreme flexibility and variety. It already happened, no going back here.

    The Perl community for example is very forceful to really push people into using modules from CPAN exactly BECAUSE they’re tested and have solved hundreds of problems for you – you don’t split a CSV file by hand, you use Text::CSV which is proven to work for a decade. You won’t do better with your meager little split. :)

    You don’t parse HTML with regex, you use a module. You don’t validate email addresses by hand, you use a module. We have modules like Regex::Common for example which contain a range of common usecases people usally throw (bad) regex on or modules which pretty much read any stupid format known to mankind – it’s all put into modules and more modules.

    Luckily, Perl also has most of the necessary modules I usally call “boring shit” modules – Excel reading and writing anyone? Those are the true heroes in my opinion – people who sit down and really write this kind of ungrateful, boring stuff you suddenly need there’s *gasp* lurking an “enterprise” around the corner – not to mention that nobody in their right mind is going to implement SSL bindings three times. (One lesson for JavaScript/Node might be to maybe not add another web framework but deliberately go for the boring stuff sooner rather than later and get it over with..)

    For my part, I can’t rave enough about CPAN and some of the people poured their hearts and time and knowledge into some modules. Billion euro/dollar businesses run on it – some for over a decade straight.

    So, hack, upload and enjoy. :)

    Yours truely – a Perl lover. :)

    CPAN’s main website: http://metacpan.org

  12. subdigit said on :

    Maven is also something that’s begun to pick up pace in the Java community. Stats seem to be here: http://search.maven.org/#stats

    But not sure about exact growth rate. A pretty picture is available here: http://mvnrepository.com/

    While having a small standard library is great for essentially encouraging reinventing the wheel to make it better, it can at times be a detriment for “just getting things done”. It’s called a “standard” for a reason, which is that you can rely on it to be present and available for use, and more importantly, for understanding by the developers that come in and out of a project.

    That being said, popular extensions will solidify and essentially become a 3rd party standard for use and becomes indispensable and tied into the core project, whether it’s just the common-knowledge standard du jour, or it actually gets merged in.

    It certainly is a more interesting way to get a diversity of development of what people consider to be core though. For that, I find that model interesting to watch and see how it grows and if anything I favor “wins” :).

  13. Su-Shee: thanks for the education! I agree, CPAN sounds great. Since we’re getting along so well, I won’t say anything about the language itself. :)

    A few minor clarifications on javascript in 2013: Don’t `npm install printf`, use util.format: http://nodejs.org/api/util.html#util_util_format_format . Underscore is much less necessary nowadays (especially in node.js) as a result of ecmascript 1.6 improvements (See “iteration methods” at https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Array ). Also using jslint will keep you 99% quirk-free (I have it integrated into my builds and vim).

  14. There’s no need to slander the GPL in this post, it’s just as permissive and would allow the same type of collaboration. The trouble is that most developers also want to use libraries as part of proprietary code bases. I don’t understand why they don’t just dual-license; GPL for non-profits and devs, MIT/BSD for commercial/proprietary software.

    There really should be a bigger push to license internal code under the GPL. 99% of the time it doesn’t offer a competitive edge. It’s the data and marketing and network effects that have more to do with it than anything.

  15. Andrew de Andrade said on :

    Great post. I’d also add to that list the benefits of local by default packages that are in your project folder. The approach of tools like easy_install, pip, bundler, etc is to hide all your depemdencies in some dot-prefixed folder outside your project directory. This promotes an out of sight, out of mind mentality. The npm approach on the other hand puts them in your project and makes it much more likely that you are going to explore the source of your dependencies, more likely that you bug fix those dependencies instead of work around bugs and most importantly you’ll treat that folder like part of your own project. This last point means that you are likely to develop one of your own generic modules to the point where it’s good enough to submit back to npm as a public module.

  16. Su-Shee said on :

    Oh, go ahead and say something about the language and I will probably tell you the exact same thing back about JavaScript – or something quite similar. If anything, both have A LOT of history and baggage – but they both also have A LOT of flexibility, otherwise the range of modules existing as of today wouldn’t even be possible and people wouldn’t deliver at that speed. And let’s face it: We can happily unite (Come along, you too Java, don’t be shy…) in Python’s and Ruby’s dislike for {} and ; – and we use a lot of them. ;)

    Sadly, format.util supported too few options when I looked last. (In the end, I did it in DB anyways..) – but it’ll evolve, I’m sure.

    Underscore will probably move to other list-supporting functions over the years I would expect, the map-grep-filter-foreach range is just the basics anyways. (At least, Perl didn’t need map-grep-foreach-shift-unshift-push-pop-splice as a module, that really is core-core…)

    And before someone is tempted to bitch about Java-and-Maven: _THAT_ community delivered literally EVERYTHING I can think up of “boring shit” and “enterprisey stuff” you can imagine. And if you ever looked for “good PDF manipulation” for example: Java did deliver. Kudos for that.

    If anything the lesson simply is: Yes, you want TONS of modules, yes, the environment IS important, no, it doesn’t matter if you’re a less-than-perfect language, as long as you support a certain flexibility, people will work around it.

    And style evolves a lot – nobody writes JavaScript like in 2000 anymore and you get discouraged a lot to do so – same goes for Perl. But if you already had a web business in 2000, you probably just can’t throw out your entire carefully crafted JavaScript or your backend of the day (in 2000) and that is something EVERYBODY faces. Perl did and we paid dearly, Java does, Ruby will, Python will do too.

    JavaScript showed – thanks to Douglas Crockford and jQuery among others and yes, thanks to Node.js, too – how to evolve nicely into a new style – but also Node.js setups will at some point be really huge and not that easily updated anymore and then you face the very same thing everybody else does.

    And god do I wish this petty bickering among languages would stop – we owe each other so many things – all of us – and all of us owe Lisp and Smalltalk and C.

    Let me all remind you that there was a time when “real programmers” laughed about using a “scripting language” on this cheap little Linux hack with this meh little HTML thingie to put up a business on and now look where we all are today…

  17. Rudolph: “Slander” is a legal term and a pretty ridiculous one for what I said. More legalese isn’t going to help the GPL’s case. You already agree that the GPL is more restrictive for people mixing software with other proprietary software, so how can it *possibly* be slanderous for me to say it’s not as permissive?

  18. Hisham said on :

    Well, counting the age of the Python and Ruby repos by the age of the language (and not of the package management projects, such as pypi and rubygems) while considering node.js by the age of the project and not the language (JavaScript) skews the numbers quite a bit!

    Let’s compare by language age:

    python: 29,720 packages / 22 years (1991) = 1351 packages per year

    ruby: 54,385 packages / 18 years (1995) = 3022 packages per year

    node.js 26,966 packages / 18 years (1995) = 1498 packages per year

    And by package manager age:

    python: 29,720 packages / 11 years (2002?) = 2701 packages per year

    ruby: 54,385 packages / 10 years (2003) = 5438 packages per year

    node.js 26,966 packages / 4 years (2009) = 6742 packages per year

    Except that, of course, these numbers are meaningless, because each language/tools has its own submission evaluation process (including “no evaluation”), and also different granularities in how to count “packages” (by module? by a set of modules that compose a project?).

    Also, language-specific package managers are more or less a new thing — counting for example the number of packages submitted last year would be a more meaningful metric (apart from the problems I raised above.)

    Still, it’s nice to see that language-specifc package managers are thriving, and I wish all the best to all of them. :)

  19. TJ said on :

    The reason Express is the size it is, is because all these things are what we would be duplicating each time anyway. Yes a lot of it could/should/will live in npm, but an aggregate is not harmful. Look at npm-www for example, this is what happens when you re-create things every single time you write an app, it becomes are large unwieldy mess of stitching together random things and re-inventing within the app itself instead of an aggregate “framework”. Most middleware should absolutely be abstracted to a lower level and then “wrapped” as middleware, there’s no question about that, but should they cease to exist? absolute not.

  20. TJ: Yeah, I actually agree that the core http module isn’t a useful enough abstraction to be used for many (most?) real world web applications. It also doesn’t offer a reasonable way to add functionality other than wrapping it or monkey-patching the request/response objects. Cross-cutting concerns are the obvious problem (security, logging, routing, etc), but also: what if you want to just add functionality that’s more easily accessible than constantly requiring the same 20 modules that are all very HTTP relevant? And I agree that larger aggregates are useful and valid (and I bet you’d also agree conversely that putting an orm in a web server abstraction is a superfluous aggregation, and so there are limits to what aggregates make sense).

    I don’t actually have a better solution than what express does. I hope that was obvious from my comments. I certainly tried, but ended up coming back to connect. At the same time, I find it unfortunate that connect middleware have non-obvious interdependencies and are difficult to keep clash-free (basically all the problems of monkey-patching). But like I say, I haven’t got a better proposal.

    I personally don’t think node.js has the HTTP abstraction problem solved very well (yet).

    > Most middleware should absolutely be abstracted to a lower level and then “wrapped” as middleware, there’s no question about that

    Not a bad start!

  21. @Su-Shee: If there would be an upvote button on comments here I would’ve hit it so hard that it would break my mouse! Well written, +1 to you.

  22. Some times I get confused when people try to convince other’s programmers that easily re-usuable modules are awesome and you should try them. Then I remember that not every was into perl 10 years a go… Even if you’re not a fan of the perl syntax (beauty), CPAN is the shit and so is NPM.

  23. Pingback: This Week's Links: _why Edition - NYTimes.com

  24. Adam said on :

    I’m sure this was already said.. but you can’t compare node.js to php/ruby. I am a node developer.. but node is a new FRAMEWORK. It would be more accurate to compare JAVASCRIPT to php/ruby.

    Just FYI

  25. :

    Adam: The comparisons I make are comparisons in the customs of the communities surrounding those technologies and not the technologies themselves. I think at that level, it’s apples-to-apples.

  26. Juzer Ali said on :

    @PeterisP:
    > “Batteries not included” approach puts a significant cost on application developers.
    I agree, it sure wastes a hell lot of time.

    @TJ: Is express code too succinct for what it does? For example, its very difficult to figure out what is happening inside a router https://github.com/visionmedia/express/blob/master/lib/router/index.js#L46

  27. Pingback: To node or not to node – overview of Node.js | Developing Scalable Applications 

  28. Simon Boudrias said on :

    @Adam I have to disagree here. JavaScript is only a language (it is syntax), PHP/Ruby are more than syntax, it is syntax + all about the runtime and functions hooks to the environment. Node.js is JavaScript + runtime/hooks stuff. That’s why Node.js is comparable to PHP and Ruby, JavaScript isn’t.

  29. Pingback: ~ The Truth about JavaScript. « Clint Nash 

If you’ve developed SaaS in a large organization (or even a small organization that acts like one), you’ve probably seen some of the symptoms of fear of releasing software:

  • Off-hours deployments and weekend deployments
  • Infrequent releases
  • extended code-freeze and testing cycles
  • Reluctance to refactor
  • Feature branches for the purpose of ‘cherry-picking’
  • release managers
  • many non-customer-facing environments before production.

These symptoms usually happen only in larger/older organizations, because of two factors:

  • They take a lot more personnel / time to implement than smaller organizations have.
  • They are usually processes and attitudes that are collected as a reaction to problems that occur in the course of a company’s lifetime.

“But you can’t be too careful!”

One problem is that you CAN indeed be too careful. It’s absolutely imperative that the costs be considered as well, if you want software development to still go reasonably fast. Otherwise these manual preventative processes will slowly creep in over time and your team will be releasing at a snail’s pace.

The other problem is that these policies alone are in some ways not careful enough. Where are the techniques for mitigating defects that inevitably still slip through? Where are the techniques to detect them, limit the customers exposed, and limit the time that customers are exposed?

All of these symptoms have a few things in common: They tend toward heavy manual and preventative process added generously over time, usually with little thought to the cost/benefit ratio of such process. It’s all too common for every production issue to result in a new process to prevent it, without regard to how infrequently that problem might occur, how damaging the result is, or how easily it is rectified.

So how do we keep bugs from getting to our customers?!?

Non-preventative measures (I call them “Reactive Measures“) do indeed exist, and can help you safely maintain a fast and steady flow of new features to customers.

  • Feature-Flags allow you to commit incomplete work to master and push it to production safely. Customer-specific feature flags allow you to test in production or enroll customers in a beta program. With a good feature-flag system, it’s even possible to release a feature to only a certain percentage of customers, and to watch your metrics to see the effects.
  • Easy/fast (read “Automated”) deployment. If deployment takes a long time, then you can’t afford to deploy often. Also, if you can’t deploy quickly, you can’t redeploy quickly (in the event of a problem).
  • Easy-undo. Rolling back from a defective release to a known-good release is the simplest and fastest way to recover from defects, if the process is both quick and easy. Defects will happen no matter how much prevention you use: Easy-undo can mean the difference between light customer impact and heavy customer impact.
  • Smaller releases. Smaller releases have less code, and therefore generally less bugs. Because they don’t contain many changes, they’re easier to test and a rollback is less-likely to negatively impact a customer. Also when a production defect appears in the latest release you have much less code to sift through to determine the cause.
  • Practice Continuous Integration (which necessarily involves always committing to master, and only incidentally has to do with using a CI server) . Feature branches, infrequent commits, and committing non-production-ready work are all practices that need to be avoided, because they are all time-consuming delaying mechanisms.
  • “Done” == “In Production”. If the team moves on to new features before the last one is in production, then that last feature is obviously unnecessarily being held back from release. They will be faster if they deliver to production and ensure it functions there properly before moving on to new features because it avoids unnecessary and time-consuming multi-tasking. This goes hand-in-hand with smaller, more frequent releases.
  • Comprehensive Automated testing. When you deploy more often, you need to test more often. The cheapest way to do this is with comprehensive automated tests. While they’re a heavy-upfront cost, if you run them as often as possible, they’ll pay for themselves very shortly.
  • Continuous Production testing. Create automated black-box integration tests for some of your most important features and figure out a way to run them repeatedly and continuously against your production deployment so that you know instantly when those features are broken. There’s no need to wait for customer feedback to learn about a faulty release.
  • Searchable, aggregated logging with proper log-levels. Making your logs easily searchable in one place makes them something that developers will actually use for defect detection and for debugging. They can go a long way to making your application more transparent.
  • Runtime monitoring, runtime metrics and information radiators (most importantly alerts). Instrument the things you care about most, and have alerts sent out when the measurements are abnormal.

Note: Of course some preventative processes are necessary, but when they are, automation is almost always preferable to time-consuming manual human processes. Either way, you should consider the cost of the process and determine if it the fall-out from the type of defect it prevents is actually worth that cost.

In the end, a fear of releasing is a fear of change, and a fear of change is a fear of improvement. This is a crucial lesson that Big Company software teams can learn from the small, scrappy start-ups that are quickly trying to disrupt them. The more often you release, the more often you can test hypotheses and get customer feedback, and the more often you can course-correct. Companies that don’t realize this are susceptible to being leap-frogged by competitors that do.

Version control branches come with costs. I’m at the point now that I don’t think it’s possible to get away without spending a couple of days every month managing issues with version control and merging if you’re using feature branches liberally as is common with DVCS (like git) users. I’ve spent days myself in the past months, so I felt compelled to write this because those were days I could’ve been actually doing something that adds value instead.

The purpose of branching is to delay integration of some code into the main code line (for various reasons). There’s a heavy cost that comes from this in that the less often you merge, the harder it is to merge. I’m blown away by the effectiveness of automatic conflict resolution in git, but the bottom line is that if two developers are working in the same area of the codebase, semantic conflicts will emerge that need the intervention of a human to resolve. These can be complicated and tedious and involve multiple developers. Yes, branching is easier with DVCS, but this actually exacerbates a fundamental merging problem that DVCS doesn’t solve.

“What if I keep my branches short-lived?”

There are some better defenses for branching, like “What if I keep my branches very short-lived?”, and while it does mitigate most of the problems with merging, it also mitigates any advantage you think you might get from branching. If you’re only going to branch for a day, what’s the point?

“Then how can I pick and choose what gets added/released?”

Merging early and often leads to simpler merges, but what about the case when you get partway through a feature and want to scrap it because you’ve come up with a better way? There are a number of ways to mitigate this problem, but if your need to “unmerge” occurs more often than your need to merge, you’ve probably got larger organizational issues to deal with. If those issues are purely technical, try more prototyping to uncover those issues earlier, and feature-toggles for better control over what gets released. If those organizational issues are non-technical, like frequent changing requirements mid-feature-development then you simply need to get better (and probably smaller) user stories before commencing.

“What if I do daily merges from the mainline into my feature branch?”

You’d think it shouldn’t matter which way you merge if you merge often, but that’s only fine if you’re the only one branching. As soon as someone else on the team comes along and starts up their own feature branch and follows your same best practice, the two of you are diverging from each other and creating merge-debt.

Using a continuous integration server is NOT sufficient for actually practicing continuous integration.

I’ve heard of some teams using a bot to auto-add all branches to the team’s CI server, making sure that each branch gets put through the proper rigor of the CI server. This is actually cargo-cult continuous integration though because those teams are not actually continually integrating at all. Merging early and often is a prerequisite of continuous integration, by definition. Instead, running multiple branches in CI supports maintaining those multiple distinct branches for even longer, so you’re actually using the CI server to undermine the ideals of CI. Even if you run a special build that tries to auto-merge all branches (yeah right!), but is not the mainline and is not intended for release, you lose the ability to regularly deliver work to some staging environment for manual regression testing.

A small non-distributed team doesn’t have the same problems as Linus

If your team is in the same room and practices collective code ownership (instead of having a central maintainer) you should just do in-person peer code reviews (or better yet, pair programming) instead of relying on pull requests. It’s wasteful to force every aspect of intra-team communication to go through the computers in the room. I would even go so far as to say that if you’re in the same room, you don’t actually need a DVCS.

Most of the reasons behind feature branches seem to be a cure that’s worse than the disease itself. It’s vastly simpler for us to commit to the mainline at all times, early and often, and deal with deactivating or removing particular pieces of code in other ways (like feature-toggles) as those problems arise (which does not in my experience tend to be very often compared to how often I am merging new stuff in that stays in).

I don’t mean to be negative about DVCS either. I actually use git exclusively now, I think github has been a huge boon to open source, and I love anything that makes Linus more effective at maintaining my favourite OS kernel. It just really seems like feature-branches have never solved any real problem that I have, while they have caused a lot of pain.