Why AI Coding is the biggest use case
Or the human bottle neck problem
Most of Anthropics’ revenue acceleration comes from coding. OpenAI now focuses more on coding as a consequence.
They thought ChatGPT and end consumers would be the main users for AI. But the token-hungry developers are the people to sell to right now.
And rightly so. Agentic engineering is great.
Why coding is the biggest AI use case
Currently Coding is the kind of knowledge work that is changing the most. Almost all agree on that. And it keeps improving.
With the newest models (gpt 5.5., opus 4.8) you can really throw complicated problems at the model and it will work long and hard.
I worked as an Engineer and as a Product Manager and as a Founder. And in all those roles I wanted to get features done, refactors done. This “getting things done in the code” is speeding up by 10-100x. Magnitudes, without hyperbole.
And there are a few reasons why that is the case. Why we can throw so much machine intelligence at a problem and it creates useful results.
Firstly the agent can verify it’s work. Does the code compile, does the website look right, do the tests pass. It can close the loop so to say.
Also there is a ton of conventions around how to structure a coding project.
Where do certain files go, how to layout a new feature, where to reuse existing functionality.
Human bottleneck
In most other tasks, marketing, business planning, email writing, research … there is no way for the agent to test its work.
So usually we as human have to do that. We get a report, a plan, a draft email and we review it.
Modern harnesses (like codex, claude code, pi) let the model run in many steps to make a plan, verify and incrementally improve the result.
But with most tasks there is no such thing as the verify step. And so the model just says: Ok, good enough, here is your business plan.
The human verifier becomes the bottleneck. Meaning the AI could do a lot more useful iterations to improve the output before giving it to you, but it doesn’t have feedback, so it can’t.
what to do about it
Why would we want to fix the human bottle neck problem?
Well you don’t if you don’t want to.
I’m interested in fixing it, because now that we have this seemingly intelligent statistical machine (LLMs) at our disposal, the point is how do we utilize that intelligence.
Like a company with employees sitting around idle, you as the boss think, let’s put theses talented people to work.
Now, while LLMs are no people, they are talented. And I want to put them to work.
Verifiability is the main problem to solve in my eyes.
Am I on the right track? What should I improve? That’s what we have to allow the agent to answer for themselves instead of waiting for us.
The more we can do that, the more we can leverage the things.
There is the concept of consolidation and compression which I think is the second biggest problem to solve, but I’ll keep that for the next post.
Verifiability
The best kind of verifiability is mathematical proof. I can logically proof to you that A + B = C.
And while that might not be obvious, there are a low of knowledge working tasks that allow us to verify.
Example: Let’s say you publish 50 social media posts and you review the statistics on how much engagement these posts generated. That gives you a loop, where you can learn from what has worked and improve your social media posting strategy. If an AI agent gets access to those stats, it could “self improve” over time.
Now, most things are not math. They are more of a judgement call. Even in coding.
Does this landing page look good?
Does this code block belong in this module?
Do we need an extra button for when the page fails to load?
All these things are not true of false, but judgement calls. But yet the AI Coding agents can do a decent job at making these calls.
So, how do we build judgement systems around specific topics?
I believe we will have niche judgement systems that are trained by humans.
Big cooperations are already training systems to e.g. review what a good business case looks like. The system will give feedback to the agent, and the agent will improve based on that expert judgement model.
Let’s look at an example:
Your company is called McWinsley. You create PowerPoint Slides for a living. You have a very particular sense of what a good deck is.
It depends on many factors like look, feel, what kind of client we are working with, what the interactions with the client were before.
And now we fire up Claude or Codex and start building a slide deck. We do some research on the company, we start drafting. We might even feed in a bit of context on the client.
Now the AI Agent creates a first draft. Usually it would just spit it out and let you as human review.
Now, it can send that draft to the McWinsley internal slide-judge-system and that system will review the slides thoroughly with different knowledge than you local AI Agent (as in different training data).
The feedback comes back and the agent iterates.
why we will have AI-Judges and what about me
So, intellectually this might be interesting but what about me?
How is this relevant to our work, today.
Well the core premise is that we can only unleash the intelligence of AI towards knowledge worker jobs, when we are able to resolve the human-bottleneck-problem at least partially.
Verifiability of the quality of the “generated work” is what humans need to currently judge.
Building expert AI-Judges allows us humans to step in later, basically after the “generated work” was already grilled by one or many experts and iterated upon.
The human in that picture becomes the CEO who only gets to see the very best version of the slide deck and not the first 12 drafts.
In this age of infinite generation, our human mental bandwidth is both the bottleneck and the sacred thing to protect.
If we don’t all want to be overwhelmed, exhausted and in the end inefficient, we need to work on taking ourselves out of the loop for longer.
Stay sane out there
niko




