Why "Never Not My Job" is the Cloud's Enduring Truth
Colin Percival's recent reflection, "20 Years on AWS and Never Not My Job," has certainly sparked conversation, particularly on Hacker News. The mainstream narrative often frames his blog post as a technical and human chronicle, proof of long-term cloud engagement. But for those of us who've actually built and maintained large-scale distributed systems, Percival's title isn't just a catchy phrase; it's the stark, unvarnished truth about operating in the cloud, even after two decades. This enduring reality, the 'AWS Never Not My Job' principle, challenges the fantasy that cloud abstracts away all your problems.
I've seen countless teams chase the promise of "managed services" only to find themselves drowning in a different kind of operational complexity. Percival's experience cuts through that hype, showing us what the 'Never Not My Job' reality truly entails.
Why 'AWS Never Not My Job' is the Cloud's Enduring Truth
The Architecture: Simplicity as a Security Primitive
Percival's Tarsnap service, launched in 2006, is a masterclass in minimalist, security-first architecture. He built it on Amazon S3 and a single EC2 instance. That's it. This isn't some sprawling microservice mesh with dozens of managed services. It's a lean, purpose-built system designed for users "truly paranoid about security."
The core idea here is that security isn't an add-on; it's baked into the design. Tarsnap combines S3's reliability with advanced encryption and deduplication, ensuring that even Percival, as the provider, can't access user data. This level of data integrity and confidentiality is a direct result of a deliberate architectural choice to control the entire data path, rather than relying on opaque vendor implementations. (I've had to untangle systems where "security" was an afterthought, bolted on with a WAF and a prayer, and it never ends well.)
His unilateral decision to bring FreeBSD to EC2 in 2008, and then maintain it for 16 years, further highlights this deep engagement. This deep engagement, a clear manifestation of the 'Never Not My Job' ethos, meant adapting to new virtualized hardware and API changes, proving that even foundational operating system support in the cloud isn't a "set it and forget it" proposition – it's truly 'Never Not My Job'.
The Bottleneck: Cognitive Load, Not Just Throughput
You might think a single EC2 instance would be the bottleneck for a service like Tarsnap. But the real bottleneck Percival describes isn't about raw throughput or latency; it's about the *cognitive load* and *continuous adaptation* required to operate in a constantly shifting cloud environment.
AWS, which started with two services in 2006, now offers over 240. Each new service, each API change, each underlying hardware revision, means something you need to understand, evaluate, and potentially adapt to. For Percival, maintaining FreeBSD on EC2 meant a continuous cycle of adapting to these changes. Navigating this complexity is precisely why the 'Never Not My Job' mantra resonates so deeply with cloud practitioners. This isn't a technical bottleneck in the traditional sense; it's an *operational bottleneck* that demands constant vigilance and deep platform knowledge.
The cloud doesn't eliminate operational work; it transforms it. Instead of racking servers, you're now debugging IAM policies, understanding eventual consistency models, and tracking API deprecations. This continuous learning curve, this 'AWS Never Not My Job' reality, is the true scaling challenge for many organizations. It's why even well-resourced teams struggle to keep up.
The Trade-offs: Control for Operational Burden
Percival's choices reflect a clear set of trade-offs. He trades off the *convenience* of fully managed services for *absolute control* over his security guarantees. When you're building a service for "truly paranoid" users, you can't outsource trust. This means understanding the consistency guarantees of S3 (which offers strong consistency for PUT-after-PUT, but eventual consistency for some other operations) and designing your application logic to account for them.
His approach also highlights the fundamental tension between *vendor lock-in* and *operational simplicity*. By deeply integrating with AWS primitives, he gains stability and access to solid infrastructure. But it also means he's on the hook for understanding and adapting to AWS's evolution. The 'AWS Never Not My Job' aspect is the direct consequence of choosing this path: you gain architectural integrity, but you accept the ongoing operational burden.
This isn't just about CAP theorem, though that's always lurking in the background when you're dealing with distributed storage like S3. It's about the broader architectural trade-off: how much of your operational surface are you willing to own to achieve your non-functional requirements, especially security and data integrity? Percival chose to own a lot of it, embodying the 'Never Not My Job' approach, and it paid off.
The Pattern: Embrace the Enduring Demands
What Percival's 20 years on AWS really show us is an enduring architectural pattern, one that often gets lost in the rush for the latest cloud buzzword.
- Deep Primitives Knowledge is Essential: Don't treat AWS as a black box. Understand S3's consistency model, EC2's virtualization layers, and the implications of each service. This deep knowledge lets you build truly resilient and secure systems.
- Security by Design, Not by Feature: Tarsnap's success stems from integrating security from the ground up. This means understanding cryptographic primitives, key management, and data lifecycle, rather than relying on a checkbox security feature.
- Idempotency is Your Lifeline: While not explicitly detailed, any solid backup service like Tarsnap relies heavily on idempotent operations. If a network call fails or a process restarts, you need to be able to retry an operation without corrupting data or creating duplicates. This is non-negotiable for data integrity.
- Cloud Providers Optimize for Themselves: Percival's experience with FreeBSD on EC2 makes this clear. AWS didn't plan for FreeBSD support; he made it happen. This exemplifies the 'Never Not My Job' principle: Your architecture needs to account for the provider's priorities, not assume perfect alignment.
- Operational Excellence is a Continuous State: The 'AWS Never Not My Job' isn't a complaint; it's a description of reality. Building and maintaining reliable distributed systems, even in the cloud, demands constant engagement, learning, and adaptation. There's no magic button for "zero ops," only the enduring truth of 'Never Not My Job'.
The lesson here is clear: the cloud offers incredible power, but it doesn't absolve you of responsibility. It shifts it. If you want to build systems that last, that are secure, and that truly deliver on their promises, you have to embrace the fact that the 'AWS Never Not My Job' principle will always apply.