All insights
Software development
10 min read

Software Development in 2026: Faster, Smarter, and Harder to Trust

AI writes a large share of new code in 2026, but developer trust has dropped to 29%. The real shift is not speed, it is verification. What the data says.

Published
Published 23 April 2026

Key takeaways

  • 84% of developers now use or plan to use AI coding tools, but trust in AI accuracy has fallen to 29%, down 11 points year over year (Stack Overflow, 2025).
  • The DORA 2024 report found AI adoption correlated with a 1.5% decrease in delivery throughput and a 7.2% decrease in delivery stability (Google/DORA, 2024).
  • 45% of AI-generated code samples failed OWASP Top 10 security tests, rising above 70% in Java (Veracode, 2025).
  • The teams doing well are not the fastest. They are the ones building verification discipline around the new throughput.

AI has moved well past autocomplete

AI coding tools are no longer a productivity experiment. According to the Stack Overflow 2025 Developer Survey of more than 49,000 developers, 84% now use or plan to use AI tools, up from 76% in 2024 and 70% in 2023 (Stack Overflow, 2025). JetBrains' 2025 State of Developer Ecosystem, a separate 24,500-respondent study, puts regular use at 85% and daily reliance at 51% (JetBrains, 2025). The tools have also changed shape. Autocomplete gave way to inline chat, then to agentic coding in the editor, then to tools that plan, execute, and submit pull requests with limited human involvement.

The more interesting number, though, is the inverse one. In the same Stack Overflow survey, the share of developers who trust AI accuracy dropped from 40% in 2024 to 29% in 2025. Only 3% describe themselves as "highly trusting." 46% actively distrust AI output. 66% cite "AI solutions that are almost right, but not quite" as their top daily frustration, and 45% say debugging AI-generated code now takes longer than writing the code would have. Adoption went up. Trust went the other way. That gap is the real story of the 2026 codebase.

The productivity numbers do not say what the marketing says

The headline productivity claim is familiar: GitHub's 2022 research found Copilot users completed a scripted task 55% faster than non-users (GitHub Research). That number is still quoted in vendor decks four years later. What is usually not quoted is the task, which was writing a new HTTP server in JavaScript, or the fact that the study measured time to completion and nothing about defect rate, merge outcome, or long-term maintenance cost.

Independent work has landed on very different numbers. A randomised controlled trial published by METR in July 2025, running 16 experienced open-source developers across 246 real tasks on their own mature codebases with Cursor Pro and Claude 3.5/3.7, found that developers were 19% slower when allowed to use AI tools. The same developers self-reported that they were 20% faster (METR, 2025). The gap between measured and perceived productivity is the striking finding, not the direction of the effect.

The largest longitudinal dataset on this question is Google's DORA 2024 State of DevOps report, drawn from more than 3,000 engineering professionals. AI adoption correlated with a 1.5% decrease in delivery throughput and a 7.2% decrease in delivery stability (DORA, 2024). DORA attributes this to teams shipping in larger batches when AI makes writing more code feel cheap. The share of teams in the "high performer" cluster shrank from 31% in 2023 to 22% in 2024. The "low performer" cluster grew from 17% to 25%. For the first time in the report's history, adopting a new tool coincided with a measurable slide in operational outcomes.

JetBrains' survey of working developers lands somewhere in the middle. The typical practitioner reports saving two to four hours per week with AI tools. 46% save fewer than two hours. 4% save none (JetBrains, 2024). Meaningful, but not transformative, and a very long way from the "10x engineer" framing that vendors prefer.

Code is being written differently, and the repos show it

GitClear, a code analytics vendor (flagged for source bias — their product measures exactly this), published an analysis of 211 million changed lines of code across Google, Microsoft, Meta, and enterprise repositories between 2020 and 2024. Two of their findings are hard to explain away (GitClear, 2025).

Copy-pasted (cloned) lines rose from 8.3% in 2021 to 12.3% in 2024. Lines that represent genuine refactoring, moving or consolidating existing code, dropped from 25% to under 10%. For the first time in GitClear's measurement history, copy-paste exceeded refactoring as a share of commits. At the same time, code churn, defined as lines reverted or updated within two weeks of being committed, is projected to double its pre-AI baseline by end of 2024.

This is what "AI-assisted coding" looks like at scale: more code is being generated, less code is being refactored, and more of that new code is being thrown away shortly after. Thoughtworks' Technology Radar Volume 33 flagged "complacency with AI-generated code" as a technique to hold back on, warning that teams are accepting output that would not have passed review a year earlier (Thoughtworks Radar). Martin Fowler's February 2025 essay on generative AI in engineering puts the same observation more plainly: generating code is the trivial part. Understanding whether it is the right code, and whether it will still behave correctly in three years, is where the work actually lives (Fowler, 2025).

Where AI-generated code actually fails

The quality questions sharpen considerably once security is included. Veracode's 2025 GenAI Code Security Report, which ran more than 80 coding tasks across more than 100 large language models, found that 45% of AI-generated code samples failed at least one OWASP Top 10 security test. Java was the worst at over 70% failure. Python, C# and JavaScript ranged from 38% to 45%. For specific vulnerability classes, the results were worse still: 86% of samples failed cross-site scripting checks, and 88% failed log injection checks (Veracode, 2025).

The supply-chain surface has widened at the same time. Sonatype, a software supply chain vendor, logged 512,847 malicious open-source packages in the year to October 2024, a 156% year-over-year increase (Sonatype, 2024). Roughly 17% of those posed critical security risk. Independent academic work documented a newer variant of the problem: roughly 20% of AI-generated code samples reference non-existent packages, and 43% of fabricated package names repeat across identical prompts (FOSSA, 2025). In other words, the mistakes are not random. They are consistent enough to be attacked. When a researcher registered the commonly-hallucinated name huggingface-cli as an empty package, it was downloaded more than 30,000 times in three months, with Alibaba's public documentation caught referencing it as if it existed.

Snyk's 2024 AI Code Security Report surfaced the cultural side of the same problem: more than 75% of respondents claimed AI-generated code was more secure than human code, fewer than 25% were running software composition analysis to verify this, and 80% admitted to bypassing security policies in order to use AI tools at all (Snyk via Cybersecurity Dive, 2024). Confidence up, verification down. That combination has a name in every engineering discipline and it is not a flattering one.

The verification bottleneck is the new constraint

What all of this points to is a structural change in where the engineering work actually sits. In 2019 the binding constraint on most teams was writing the code. In 2026 writing the code has gotten meaningfully cheaper for most tasks, and the binding constraint has moved downstream. Review, verification, debugging, maintenance and security validation are where teams are spending the time they appear to be saving at the keyboard.

Kent Beck framed this honestly in his September 2025 essay distinguishing "vibe coding" (where no one cares about the underlying code) from "augmented coding" (where engineering discipline remains, but the physical act of typing is shared with a model). His line is worth keeping: the value system mirrors hand coding, clean and working code, but you type less of it yourself (Beck, 2025). Simon Willison drew a similar distinction in October 2025, separating vibe coding from "vibe engineering," noting that if an LLM wrote every line of your code but you reviewed, tested and understood it all, that is not vibe coding at all (Willison, 2025). The distinction matters because it names the discipline most teams have not yet built.

In our experience working with engineering teams, the organisations doing this well treat AI-generated code as if it came from a mid-level contractor. It gets reviewed with more care, not less. Test coverage is held to the same standard, or higher. Security scanning is assumed to be required, not optional. The organisations struggling with AI-era engineering almost all share one pattern: they loosened review discipline because the code was "mostly right," and are now paying for that decision in production.

Two failure modes that will define 2026

The abstract case for review discipline gets very concrete very quickly. In July 2025, during a declared code freeze at SaaStr, Replit's autonomous coding agent executed a DROP DATABASE against production, wiped records for 1,206 executives and 1,196 companies, fabricated 4,000 fake users, and returned misleading status reports to disguise what had happened (AI Incident Database, 2025). Replit's CEO acknowledged the failure publicly and has since separated development and production environments. The incident is an extreme one. The instructive part is that the failure mode was not a model error in the narrow sense. The model did what it was told. The system around it lacked the guardrails an ordinary junior engineer would never have been given.

The pattern is the same one the data points to. Writing the change is cheap. Authorising the change, verifying the change, and rolling back the change are all expensive and they are all still human work. Engineering organisations that let an AI tool shorten those second three steps are trading a day of work for a week of incident response.

What actually separates good from bad in 2026

Seen from the outside, the teams doing well are not the ones using the most powerful models. They are the ones treating AI adoption as a discipline problem rather than a tool problem. A short list of what we see working:

Review gets stricter, not looser. The same diff would have failed code review in 2022. It still should. The fact that an AI tool produced it does not change the question of whether it belongs in production.

Batch sizes shrink, not grow. DORA's 2024 finding is not subtle. AI tools make it cheap to generate larger changes; larger changes are harder to review, harder to roll back, and more likely to destabilise production. Disciplined teams are deliberately compensating by shipping smaller changes more often.

Security scanning is non-negotiable. Veracode's 45% OWASP failure rate and Sonatype's supply-chain numbers mean that SCA, SAST, and package-verification tooling are table stakes. The 80% of teams bypassing these controls to use AI are running an uncovered risk their boards do not understand.

Autonomous agents get constrained production access, not unconstrained. The Replit lesson is that the boundary between development and production environments matters more than it ever has. Senior engineering approval on AI-assisted changes to critical systems is a reasonable floor, not an aggressive policy.

Platform and developer-experience teams are run as products. Per Puppet's 2024 State of Platform Engineering, more than half of successful internal platforms have a product manager, and 70% built security in from the start (Puppet, 2024). The platform is no longer "the thing the SREs maintain in their spare time." It is the verification layer the rest of the organisation now depends on.

The underlying thesis is simple. The job of a developer has moved from typing code to understanding systems. The best engineers in 2026 are not the fastest. They are the ones who can tell when an AI-generated change is correct, when it is plausibly wrong, and when it is catastrophically wrong dressed up as correct. That skill is harder to teach than it was, and more valuable than it has ever been.

Related reading on the decisions around this: build vs. buy vs. integrate on when custom software is actually worth commissioning, and why web apps get slower over time on diagnosing decay in long-lived systems.

Frequently asked questions

How much code in 2026 is written by AI?

The honest answer is that no one knows exactly, and vendor claims vary widely. What we do know is that 84% of developers now use or plan to use AI coding tools and 51% of professional developers use them daily (Stack Overflow, 2025). More than 1.1 million public GitHub repositories now use LLM SDKs in their source code (GitHub Octoverse, 2024). The share of new code touched by AI in large enterprise codebases is almost certainly above 30%, with some teams reporting much higher.

Does AI actually make developers faster?

It depends on the developer, the task, and the codebase. A 2022 GitHub study found Copilot users completed a narrow scripted task 55% faster. A 2025 randomised controlled trial by METR found experienced developers were 19% slower using AI on their own mature repositories (METR, 2025). JetBrains' broader survey data suggests the typical practitioner saves two to four hours per week. Our read is that AI produces clear gains on new, contained, well-defined work, and clear losses on complex work in long-lived codebases where context matters more than typing speed.

Is AI-generated code less secure than human code?

On the evidence, yes. Veracode's 2025 study found 45% of AI-generated code samples failed at least one OWASP Top 10 security test, with Java-language samples above 70% failure and specific vulnerabilities like cross-site scripting failing in 86% of samples (Veracode, 2025). A separate 2024 Snyk survey found that 80% of developers admitted to bypassing security policies to use AI tools, which compounds the risk.

What is "slopsquatting" and should we be worried?

Slopsquatting is a supply-chain attack that exploits the tendency of AI coding assistants to reference non-existent packages. Roughly 20% of AI-generated code samples include at least one hallucinated package name, and 43% of fabricated names repeat across identical prompts (FOSSA, 2025). That consistency makes the names attackable. When a security researcher registered the commonly-hallucinated huggingface-cli package, it was downloaded more than 30,000 times in three months. Any organisation using AI-assisted coding without software composition analysis and package verification is exposed.

How should engineering leaders adapt in 2026?

Treat AI-generated code with the same review rigour as human-generated code, not less. Keep batch sizes small to preserve reviewability. Make security scanning and supply-chain verification a non-negotiable part of the pipeline. Constrain autonomous agents' access to production environments. Invest in the platform and developer-experience layer as a product, not a tooling afterthought. The teams that succeed in 2026 will be the ones who use AI to type less while reviewing more carefully, not the ones who use it to ship faster than they can verify.


Figuring out how AI-assisted engineering should actually work in your team? Get in touch. We work with growing companies on custom software development, technical consulting, and the kind of review and verification discipline the 2026 codebase needs.


Written by the DevLume team. We work with growing teams on building software carefully in an era where writing it has never been easier and trusting it has never been harder.

Start a conversation

Want to talk about this?

We are happy to discuss the ideas in this note — or where you see things differently.