📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI systems are rapidly automating core engineering tasks in AI development, reaching near saturation in key benchmarks. However, AI research itself remains partly dependent on human creativity, leaving residual research work uncertain. This shift could transform AI R&D processes within the next 32 months.
Recent empirical data indicates that AI systems are now capable of automating the majority of core engineering tasks involved in AI research, reaching near saturation on multiple benchmarks. Meanwhile, research activities that require creativity and hypothesis generation remain partly reliant on human input, leaving a residual gap. This development underscores a potential transformation in AI R&D workflows over the next 32 months.
According to Thorsten Meyer’s analysis of Jack Clark’s recent work, six key benchmarks measuring AI’s capabilities in core science and engineering tasks show rapid progress. Notably, the CORE-Bench, which tests research reproduction, has reached 95.5% success, with one author declaring it ‘solved.’ This indicates that AI can now reliably reproduce research papers, handling dependencies, code execution, and output analysis at a level comparable to competent post-docs.
Similarly, the MLE-Bench, which assesses performance on Kaggle competitions, has seen AI reach 64.4%, roughly matching mid-tier human Kaggle practitioners. The leaderboard for this benchmark has been paused to develop a more fair measurement process, signaling that AI capabilities have outstripped the original evaluation methods. Additionally, progress in kernel design—such as automated GPU kernel generation—further exemplifies the transition of AI from experimental to production-ready engineering tools.
Clark’s analysis suggests that these trajectories are converging, with multiple independent benchmarks nearing saturation within a similar timeframe. The implication is that engineering tasks in AI development are increasingly automated, reducing the marginal cost and time traditionally spent on engineering work. However, the research process—such as hypothesis formulation, creative problem-solving, and novel theory development—remains less automated, with some aspects still dependent on human input. The ongoing question is how much of research can be automated before it becomes indistinguishable from engineering.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.
AI research automation tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.
automated AI development software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.
AI code reproduction tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational
AI research hypothesis generation software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications for AI R&D and Industry Innovation
The rapid automation of core engineering tasks in AI research suggests a fundamental shift in how AI development is conducted. As engineering bottlenecks diminish, the pace of AI innovation could accelerate significantly, enabling faster deployment of new models and applications. This may also reshape the role of human researchers, shifting their focus toward creative and strategic aspects rather than routine engineering. However, the residual human-driven research components could still act as a bottleneck or source of innovation, depending on how automation evolves.
For industry, this trend could lead to cost reductions, faster iteration cycles, and more accessible AI development pipelines. Conversely, it raises questions about the future of research roles and the need for new skills adapted to an increasingly automated R&D environment. The broader impact will depend on whether AI can eventually automate the remaining creative and hypothesis-driven aspects of research.
Progress in AI Engineering Capabilities and Benchmarks
Over the past year, multiple benchmarks have tracked AI progress in core research and engineering tasks. The CORE-Bench, measuring research reproduction, improved from 21.5% in September 2024 to 95.5% in December 2025, with the author of the benchmark declaring it ‘solved.’ The MLE-Bench, assessing Kaggle competition performance, advanced from 16.9% to 64.4% over the same period, prompting the organizers to pause submissions to refine measurement methods.
These improvements are supported by advances in kernel design, including automated GPU code generation and optimized deep learning kernels, demonstrating that AI systems are transitioning from experimental tools to production-grade engineering solutions. The convergence of these trajectories indicates that AI’s capability to handle engineering tasks is approaching saturation, while research remains less fully automated.
“Clark’s conclusion is correct and possibly understated for engineering. The residual research question is real but may be less binding than the framing suggests.”
— Thorsten Meyer
Unresolved Aspects of AI Research Automation
It is still unclear to what extent AI can automate the creative and hypothesis-driven aspects of research, such as novel theory generation and experimental design. While engineering tasks are nearing full automation, the residual research component may still require human insight, and the timeline for full automation remains uncertain. Additionally, the impact of these developments on research roles and industry workflows is still being evaluated.
Next Milestones in AI R&D Automation
Over the next 32 months, focus will likely be on refining measurement benchmarks to better capture AI’s capabilities, especially in research creativity and hypothesis generation. Expect continued advances in automated kernel design, model training, and deployment tools. Industry adoption of these automated engineering solutions may accelerate, while research automation remains an open challenge. Monitoring how these capabilities influence research productivity and innovation cycles will be critical.
Key Questions
How close is AI to fully automating research activities?
Currently, AI has automated core engineering tasks such as reproducing research, optimizing kernels, and performing well in competitions. However, the creative, hypothesis-driven aspects of research are still partly human-dependent. Full automation of research remains an open question.
What are the implications for human researchers?
As engineering tasks become automated, researchers may shift focus toward strategic, creative, and theoretical work. This could lead to a change in skill requirements and research workflows, emphasizing innovation over routine engineering.
Will this automation reduce research costs?
Yes, automating engineering tasks can lower costs and speed up development cycles. However, maintaining oversight and guiding AI-driven research will still require human expertise, especially in creative domains.
Could AI eventually automate all aspects of research?
This remains uncertain. While engineering automation is advancing rapidly, the automation of creative research processes is less certain and may require breakthroughs in AI understanding and reasoning.
Source: ThorstenMeyerAI.com