One of the engineers in my startup asked some thought-provoking questions in a hallway conversation today:
Should engineers own end-to-end implementation of features that cut across multiple layers of our technology stack? Or should engineers focus/specialize on specific layers of the stack and collaborate with each other to develop each feature?
I’ve repeatedly observed that small teams of developers run circles around large teams in every company I’ve ever been in. [Caveat: I focus on consumer internet software, so my observations will be biased towards that industry.] I have a strong preference for company cultures that encourage individual engineers to develop as much of a featureset as they can on their own. If there’s a front-end user interface to be created as well as a backend API required to support a particular feature, I’d like to see the same engineer coding it all. Obviously this isn’t always possible for either technical or personnel constraints, but it is the ideal in my mind.
Seeing that reminded me of the three main benefits of getting engineers to own entire featuresets, even (or maybe especially) if they cut across multiple technologies…
When an individual engineer works on a feature that they know is going to impact large numbers of users, they viscerally feel the huge contribution they’re making to their company (and society!). Nothing is more motivating than feeling like your work really makes a difference.
An engineer who really understands an entire featureset front-t0-back is able to grok critical dependencies between technology layers more effectively than a team of multiple individuals. I assert that the ability to see the whole picture enables the single engineer to more effectively write test cases, identify likely breaking points, and ultimately deliver higher overall quality code that is defensively coded against future commits.
This is a key advantage for individual engineers: the engineer who designed and coded an entire featureset on their own will be more likely and much faster to modify/iterate their designs when presented with market feedback from real consumers. In contrast, a team of engineers who each only coded parts of a feature (and therefore relied on some other person, usually a project manager or a product manager, to provide a holistic view and integration guidance) will be much slower to collectively process market feedback and iterate.
Now, this may all be much easier said than done… For one thing, it’s hard enough to recruit great engineers in any given technical area, much less a team of cross-technology superstars. And in practice, there are significant friction points to the lone-ranger style of development. E.g., if every engineer is a gun slinger they may unwittingly produce a lot of conflicting or redundant code that at some point needs to refactored. Furthermore, even if an engineer is capable of contributing code across many different levels of a tech stack, that doesn’t mean they will always *want* to do so…
Have you ever worked in a company that had a policy of “single engineer owns an entire feature”?
I’m fascinated by the way Facebook operates. It’s a very unique environment, not easily replicated (nor would their system work for all companies, even if they tried). These are notes gathered from talking with many friends at Facebook about how the company develops and releases software.
Seems like others are also interested in Facebook… The company’s developer-driven culture is coming under greater public scrutiny and other companies are grappling with if/how to implement developer-driven culture. The company is pretty secretive about its internal processes, though. Facebook’s Engineering team releases public Notes on new features and some internal systems, but these are mostly “what” kinds of articles, not “how”… So it’s not easy for outsiders to see how Facebook is able to innovate and optimize their service so much more effectively than other companies. In my own attempt as an outsider to understand more about how Facebook operates, I assembled these observations over a period of months. Out of respect for the privacy of my sources, I’ve removed all names and mention of specific features/products. And I’ve also waited for over six months to publish these notes, so they’re surely a bit out-of-date. I hope that releasing these notes will help shed some light on how Facebook has managed to push decision-making “down” in its organization without descending into chaos… It’s hard to argue with Facebook’s results or the coherence of Facebook’s product offerings. I think and hope that many consumer internet companies can learn from Facebook’s example.
- as of June 2010, the company has nearly 2000 employees, up from roughly 1100 employees 10 months ago. Nearly doubling staff in under a year!
- the two largest teams are Engineering and Ops, with roughly 400-500 team members each. Between the two they make up about 50% of the company.
- product manager to engineer ratio is roughly 1-to-7 or 1-to-10
- all engineers go through 4 to 6 week “Boot Camp” training where they learn the Facebook system by fixing bugs and listening to lectures given by more senior/tenured engineers. estimate 10% of each boot camp’s trainee class don’t make it and are counseled out of the organization.
- after boot camp, all engineers get access to live DB (comes with standard lecture about “with great power comes great responsibility” and a clear list of “fire-able offenses”, e.g., sharing private user data)
- [EDIT thx fryfrog] “There are also very good safe guards in place to prevent anyone at the company from doing the horrible sorts of things you can imagine people have the power to do being on the inside. If you have to “become” someone who is asking for support, this is logged along with a reason and closely reviewed. Straying here is not tolerated, period.”
- any engineer can modify any part of FB’s code base and check-in at-will
- very engineering driven culture. “product managers are essentially useless here.” is a quote from an engineer. engineers can modify specs mid-process, re-order work projects, and inject new feature ideas anytime. [EDITORIAL] The author of this blog post is a product manager, so this sentiment really caught my attention. As you’ll see in the rest of these notes, though, it’s apparent that Facebook’s culture has really embraced product management practices so it’s not as though the role of product management is somehow ignored or omitted. Rather, the culture of the company seems to be set so that *everyone* feels responsibility for the product.
- during monthly cross-team meetings, the engineers are the ones who present progress reports. product marketing and product management attend these meetings, but if they are particularly outspoken, there is actually feedback to the leadership that “product spoke too much at the last meeting.” they really want engineers to publicly own products and be the main point of contact for the things they built.
- resourcing for projects is purely voluntary.Engineers decide which ones sound interesting to work on. a PM lobbies group of engineers, tries to get them excited about their ideas. Engineer talks to their manager, says “I’d like to work on these 5 things this week.” Engineering Manager mostly leaves engineers’ preferences alone, may sometimes ask that certain tasks get done first.
- arguments about whether or not a feature idea is worth doing or not generally get resolved by just spending a week implementing it and then testing it on a sample of users, e.g., 1% of Nevada users.
- engineers generally want to work on infrastructure, scalability and “hard problems” — that’s where all the prestige is. can be hard to get engineers excited about working on front-end projects and user interfaces. this is the opposite of what you find in some consumer businesses where everyone wants to work on stuff that customers touch so you can point to a particular user experience and say “I built that.” At facebook, the back-end stuff like news feed algorithms, ad-targeting algorithms, memcache optimizations, etc. are the juicy projects that engineers want.
commits that affect certain high-priority features (e.g., news feed) get code reviewed before merge.News Feed is important enough that Zuckerberg reviews any changes to it, but that’s an exceptional case.
- [CORRECTION — thx epriest] “There is mandatory code review for all changes (i.e., by one or more engineers). I think the article is just saying that Zuck doesn’t look at every change personally.”
- [CORRECTION thx fryfrog] “All changes are reviewed by at least one person, and the system is easy for anyone else to look at and review your code even if you don’t invite them to. It would take intentionally malicious behavior to get un-reviewed code in.”
no QA at all, zero. engineers responsible for testing, bug fixes, and post-launch maintenance of their own work. there are some unit-testing and integration-testing frameworks available, but only sporadically used.
- [CORRECTION thx fryfrog] “I would also add that we do have QA, just not an official QA group. Every employee at an office or connected via VPN is using a version of the site that includes all the changes that are next in line to go out. This version is updated frequently and is usually 1-12 hours ahead of what the world sees. All employees are strongly encouraged to report any bugs they see and these are very quickly actioned upon.”
- re: surprise at lack of QA or automated unit tests — “most engineers are capable of writing bug-free code. it’s just that they don’t have an incentive to do so at most companies. when there’s a QA department, it’s easy to just throw it over to them to find the errors.” [EDIT: please note that this was subjective opinion, I chose to include it in this post because of the stark contrast that this draws with standard development practice at other companies]
- [CORRECTION thx epriest] “We have automated testing, including “push-blocking” tests which must pass before the release goes out. We absolutely do not believe “most engineers are capable of writing bug-free code”, much less that this is a reasonable notion to base a business upon.”
- re: surprise at lack of PM influence/control — product managers have a lot of independence and freedom. The key to being influential is to have really good relationships with engineering managers. Need to be technical enough not to suggest stupid ideas. Aside from that, there’s no need to ask for any permission or pass any reviews when establishing roadmaps/backlogs. There are relatively few PMs, but they all feel like they have responsibility for a really important and personally-interesting area of the company.
- by default all code commits get packaged into weekly releases (tuesdays)
- with extra effort, changes can go out same day
- tuesday code releases require all engineers who committed code in that week’s release candidate to be on-site
- engineers must be present in a specific IRC channel for “roll call” before the release begins or else suffer a public “shaming”
- ops team runs code releases by gradually rolling code outthere are 9
concentriclevels for rolling out new code
- facebook has around 60,000 servers
- [CORRECTION thx epriest] “The nine push phases are not concentric. There are three concentric phases (p1 = internal release, p2 = small external release, p3 = full external release). The other six phases are auxiliary tiers like our internal tools, video upload hosts, etc.”
- the smallest level is only 6 servers
- e.g., new tuesday release is rolled out to 6 servers (level 1), ops team then observes those 6 servers and make sure that they are behaving correctly before rolling forward to the next level.
- if a release is causing any issues (e.g., throwing errors, etc.) then push is halted. the engineer who committed the offending changeset is paged to fix the problem. and then the release starts over again at level 1.
- so a release may go thru levels repeatedly: 1-2-3-fix. back to 1. 1-2-3-4-5-fix. back to 1. 1-2-3-4-5-6-7-8-9.
- ops team is really well-trained, well-respected, and very business-aware. their server metrics go beyond the usual error logs, load & memory utilization stats — also include user behavior. E.g., if a new release changes the percentage of users who engage with Facebook features, the ops team will see that in their metrics and may stop a release for that reason so they can investigate.
- during the release process, ops team uses an IRC-based paging system that can ping individual engineers via Facebook, email, IRC, IM, and SMS if needed to get their attention. not responding to ops team results in public shaming.
- once code has rolled out to level 9 and is stable, then done with weekly push.
- if a feature doesn’t get coded in time for a particular weekly push, it’s not that big a deal (unless there are hard external dependencies) — features will just generally get shipped whenever they’re completed.
- getting svn-blamed, publicly shamed, or slipping projects too often will result in an engineer getting fired. “it’s a very high performance culture”. people that aren’t productive or aren’t super talented really stick out. Managers will literally take poor performers aside within 6 months of hiring and say “this just isn’t working out, you’re not a good culture fit”. this actually applies at every level of the company, even C-level and VP-level hires have been quickly dismissed if they aren’t super productive.
- [CORRECTION, thx epriest] “People do not get called out for introducing bugs. They only get called out if they ask for changes to go out with the release but aren’t around to support them in case something goes wrong (and haven’t found someone to cover for you).”
- [CORRECTION, thx epriest] “Getting blamed will NOT get you fired. We are extremely forgiving in this respect, and most of the senior engineers have pushed at least one horrible thing, myself included. As far as I know, no one has ever been fired for making mistakes of this nature.”
- [CORRECTION, thx fryfrog] “I also don’t know of anyone who has been fired for making mistakes like are mentioned in the article. I know of people who have inadvertently taken down the site. They work hard to fix what ever caused the problem and everyone learns from it. The public shaming is far more effective than fear of being fired, in my opinion.”
It’ll be super interesting to see how Facebook’s development culture evolves over time — and especially to see if the culture can continue scaling as the company grows into the thousands-of-employees.
What do you think? Would “developer-driven culture” work at your company?
I’m a co-founder of a startup and I’ve noticed that our advisors are beginning to form “teams” around two opposing schools of thought for gaining early user adoption. These two philosophies are mostly mutually exclusive so I think entrepreneurs need to make a choice about which camp they fall into… Which are you?
A lot of folks have asked for more details on the way we measured and optimized viral app growth in the Stanford class I co-taught recently. So here’s a bit more info on methodology for measuring virality and what it means for an app to “go viral.”
K-factor and R-zero
Terms like “K-factor” (contagion) and “R-zero” (reproduction rate) are often used to describe the growth rate of viral apps. These terms come from the fields of medicine and biology — they’re originally intended to describe the spread of of viral diseases, but they’re nice analogies for how web/SN apps grow. Some would even describe widgets and apps as “diseases” that have “corrupted” popular social networks like MySpace and Facebook! ;-) Of course, having worked at Slide and authored some FB apps of my own, that’s clearly not my belief… So, read on if you’re interested in viral apps!
I had the pleasure of hearing Avinash Kaushik, Google’s analytics evangelist, speak when he came to our CS377W class at Stanford this quarter (the “Stanford Facebook class“). He’s an amazing speaker, really breathing life and purpose into the too-often dry topic of web analytics.
He’s promoting a new way of looking at web analytics, what he calls “Web Analytics 2.0”. Avinash’es central message is that analytics cannot stand alone as a decision driver in organizations; rather analytics need to be considered in the context of additional data (from customers, competitors, and other internal sources) in order to drive rational decisions.
Avinash has a brilliant decision framework, consisting of the five decision inputs that should be considered in order to gain insight into customer behavior and drive optimal decisions. He calls this “The Five Pillars” and here’s the cliff’s notes summary:
It can be a lot of stuff to consider and anyone responsible for launch of a product (e.g., product manager, product marketing manager, or a business unit lead) needs to think about “product design” from multiple levels.
Personally, as a product manager, I like to think about “product design” as the sum of five design perspectives: