Great Design Thinking Beats Clever Workarounds Every Time
git commit -m "always keep learning"
Confessions from 6 years in startup engineering
There's a moment every engineer experiences, usually late at night, staring at logs, production half-burning, where you realize: you didn't fix the problem. You just outsmarted it, temporarily. And for a while, that feels like a win.
The Addiction to "Just Make It Work"
Startups train you to move fast. Ship now, figure it out later. You learn to survive on quick patches, defensive conditionals, and "edge-case handling" that quietly becomes the main logic.
And the dangerous part? You get really good at it.
You become the person who can jump into chaos and come out with something that works. People rely on you. Deadlines get met. Fires get put out. But underneath all that momentum, something else is happening: you're building a system that only works because you understand its hacks.
I've Built Systems I Was Afraid to Touch
At one point, I worked on a financial flow where transactions would occasionally duplicate. Not always. Just enough to hurt.
We did what most teams do under pressure. Disabled buttons on the frontend. Added backend checks within a time window. Introduced locks to "control" concurrency. It worked—until it didn't.
Then came the patches. Extend the lock duration. Add another check. Handle a "rare" retry case. Log more, just in case. Each change made sense in isolation. Together, they created something fragile. Eventually, touching that part of the system felt risky, not because it was complex in theory, but because it was unpredictable in practice. You couldn't reason about it anymore. You could only test it and hope.
That's when it hits you: this isn't engineering anymore. It's damage control.
The Moment Everything Clicks
The turning point wasn't some big rewrite. It was a simple question:
Why are we trying so hard to detect duplicates? Why not design the system so duplicates don't matter?
That shift led us to idempotency—not as a patch, but as a contract. Every request carried a unique identity. The system didn't guess anymore. It knew. Retries became safe. Concurrency stopped being scary. And the edge cases didn't get reduced or managed. They disappeared entirely.
A messy, defensive system became calm. Not because we added more guards, but because we removed the need for them.
When Cleanup Is the Architecture
I saw the same pattern play out differently in another system, and this time the failure was harder to see.
We had data inconsistencies between two services, order states that didn't match, and records that drifted out of sync. The quick fix was a reconciliation job: a background process that ran every few minutes, compared the two sources of truth, and quietly corrected the mismatches.
It worked beautifully. So beautifully, in fact, that we stopped thinking about it. The job ran. The data looked clean. Dashboards stayed green.
Then one Friday, the job failed silently. By Monday, customer support was flooded with reports of phantom orders, charges without wallet transactions, and withdrawals without charges. It took us two days to untangle the damage, and another week to understand why it had happened: a schema migration in one service had changed the shape of the data just enough that the reconciliation logic started skipping records instead of flagging them.
We ultimately built a reconciliation system on top of Web3 using smart contracts, enabling integrators to independently validate transactions against an immutable ledger.
That's when the real cost became clear. We hadn't built a system that was correct. We'd built a system that was corrected, constantly, automatically, and invisibly. The moment the correction stopped, the whole thing unraveled.
So we redesigned. State transitions became explicit, with each service publishing events that the other could verify against. Critical operations became atomic, either the whole thing succeeded, or nothing did. Constraints moved closer to the data, enforced at the database level rather than by application logic running after the fact.
The reconciliation job went from "critical infrastructure" to "unnecessary artifact." We kept it running for a few weeks out of paranoia, watched it find nothing, and turned it off.
The Quiet Cost of Cleverness
Here's the part most people don't talk about: clever systems don't fail loudly. They fail subtly. A duplicate transaction here. A missing record there. A reconciliation job that "usually" fixes things.
You don't notice the cost immediately. It leaks over time. Engineers slow down because they don't trust the system. New features take longer because of unknown side effects. Bugs become harder to reproduce, harder to explain. Eventually, your velocity drops, but no one can point to a single cause.
Workarounds feel productive because they give immediate results. Design feels slow because it forces you to think. But workarounds scale complexity. Design scales clarity. One makes you faster today. The other makes your entire team faster tomorrow.
No, This Is Not Overengineering
I know what you're thinking, because I've heard it in every design review where someone proposes doing things properly.
"Aren't we overcomplicating this?"
"Patrick likes overengineering things!!"
"Let's just ship it and refactor later."
And sometimes, those people are right. Overengineering is real. Building an event-sourced, CQRS-based microservice architecture to serve a landing page with a contact form is overengineering. Spending two weeks designing an abstraction layer for a feature that might get killed next sprint, that's overengineering.
But that's not what I'm talking about.
There's a difference between building for problems you imagine and designing for problems that are already hurting you. Idempotency keys on a financial endpoint that's already producing duplicates? That's not premature, that's overdue. Atomic state transitions in a system that already requires a cleanup job to stay correct? That's not gold-plating. That's plumbing.
The way I think about it is this: overengineering is solving problems you don't have. Good design makes the problems you do have structurally impossible. One is speculation. The other is engineering.
If your system already has duct tape on it, proposing a better joint isn't overengineering. It's just… engineering. The prefix "over" implies you've gone past what's needed. But when you're patching the same class of bug for the third time, you haven't gone past anything. You haven't even arrived yet.
The people who cry "overengineering" loudest are often the ones who've never had to maintain the system at 2 am. They see the upfront cost of doing it right. They don't see the compounding cost of not doing it, because that cost gets absorbed silently by the team, sprint after sprint, until everyone just accepts that "this part of the codebase is like that."
It doesn't have to be like that.
The Question That Changes Everything
Earlier in my career, I asked: "How do I fix this?"
Now I ask: "Why does this problem exist, and how do I make it impossible?"
That question is uncomfortable. It forces you to challenge assumptions, revisit decisions, and sometimes admit that the system is flawed in ways you'd rather not touch. But it's also where real engineering begins.
You can get far with cleverness. You can ship features, meet deadlines, and even impress people. But eventually, the system pushes back. It becomes slower to change, harder to debug, and more expensive to maintain. And one day, you realize you're no longer building on top of it, you're working around it.
The best systems aren't the ones that survive clever fixes. They're the ones that never needed them in the first place.


