QA in the AI era
Tags: ai, opinionAs AI tools become part of our daily dev workflows, there’s a growing concern: will AI replace us? I don’t think so.
If you read my previous post, you will know that I am cautiously optimistic about this new era of AI. I do believe that the way we as developers write apps will change forever. Does that mean we won’t write ANY code ourselves? Definitely not. Does it mean our careers and the profession in general is at risk? No chance.
In fact, skilled developers will become even more sought after as the knowledge gap gets bigger. When using generative AI, non-technical users will not be able to fine-tune, defend or scale their applications to fit all the requirements that modern web apps need. That’s where we come in.
The importance of QA
Right now if you open two different tabs using the same AI model and you provide it the same instruction, it will generate two wildly different results. Even if you give it an existing piece of code, and ask it to change something, it might restructure or modify parts that were completely irrelevant to your request.
This immediately raises a few red flags. If you are a non-technical user, blindly deploying those changes gives zero certainty that your application code still adheres to all the previous business requirements you gave the model. Sure, you could ask AI to generate tests (if you even know what they are or should test) but this is generally when you will receive the most pointless, flaky or slow running tests. Why? Because AI simply doesn’t know any better.
The problem with AI-generated tests
The main problem with AI-generated tests is that AI learns just by observing what it saw somewhere on the world wide web and using those inputs, tries to find some pattern that works most of the time. It cannot ever truly “understand” your program logic or runtime behavior, despite what the AI companies are trying to sell you.
It will mimic basic test structures (e.g. assert something == something_else
) without knowing if the output is meaningful in your specific context. It is also limited to the training data, and if that data contained shallow or boilerplate tests, it will learn to reproduce those.
AI also suffers from hallucinations, often inventing support libraries or helper methods that don’t exist. Unfortunately, there’s no real way around this.
Specs can provide context
If you look at some testing frameworks like RSpec, you see that they are extremely readable and as such, LLMs can understand your requirements way better just by looking at those. TDD anyone?
Take a look at this example:
RSpec.describe "Report management", type: :request do
it "creates a Report and redirects to the Report's page" do
get "/reports/new"
expect(response).to render_template(:new)
post "/reports", params: { report: { name: 'Daily report' } }
expect(response).to redirect_to(assigns(:report))
follow_redirect!
expect(response).to render_template(:show)
expect(response.body).to include("Report was successfully created.")
end
end
This is a highly readable piece of code, and it will outlast whatever AI-generated code we use today. If your AI agents decide to rewrite everything, you have a very clear set of rules that must pass in order for your app to hit production.
This kind of consistency will be essential for any serious company navigating the AI era.
Co-existing with AI
The truth is that AI is here to stay. It will become more and more integrated into our workflows.
We need to adapt and integrate it in a way that makes sense and let us extract the good parts while also filtering out the bad use-cases.
We must also be the gatekeepers of the AI code that hits production and consciously work on ensuring that it adheres to a certain level of quality that users of a paid product deserve.
This is why I am of the opinion that we will write more test specs and less app code in the future. This is similar to what we have seen in previous technological revolutions where humans were replaced with machines, but humans remained in charge of QA.
There has always been a balance between work and quality assurance. You should only give AI one of the two, while remaining in control of the other.
If AI is writing your code (including code completions), you need to be the one writing your tests. If you don’t want to write your tests, you shouldn’t be letting AI also write your application code. Simple as that.
Don’t offload both to the machines, unless you really don’t care about your product (or your users).
You can automate code, but you can’t automate care. Care is what turns code into products that users trust.