Software 3.0 and the Decade of AI Agents: Lessons from Andrej Karpathy

Published on 6/20/2025

  • software 3.0
  • AI agents
  • karpathy
  • LLMs
  • machine learning

In 2013, Andrej Karpathy took a flawless ride in a self-driving Waymo. Yet in 2025, we’re still working on autonomy and driving agents. It’s one thing to demo intelligence, another to deliver reliability.

Here’s what I learned from Karpathy’s recent talk at YC, and why it matters—especially for students and developers building in this evolving space.

Software 1.0 — Code by Hand

Traditional hand-written code that explicitly instructs computers what to do.

Software 2.0 — Code by Gradient

Neural network models trained on data; code learned rather than explicitly written.

Software 3.0 — Code by Conversation

Programming large language models (LLMs) using English prompts, bridging natural language and machine instructions.

Karpathy draws an interesting parallel to the 1960s time-sharing era:

  • LLM compute remains extremely costly, forcing models to be centralized in the cloud.
  • We’re still waiting for the equivalent of a “personal computing revolution” for AI. Today feels similar to the early mainframe era, where computation is remote and centralized.

Lessons from Tesla Autopilot

Karpathy highlighted Tesla Autopilot as a cautionary tale for AI implementation:

  • As neural networks grew more capable, traditional C++ code usage significantly decreased.
  • Achieving production readiness with AI takes considerable time and iteration, far longer than initial demos suggest.

“2025 isn’t the year of AI agents—it’s the decade.” — Andrej Karpathy

How LLMs differ from humans

Despite their capabilities, LLMs have distinct cognitive deficits:

  • They possess vast encyclopedic knowledge but significant cognitive limitations.
  • They still hallucinate occasionally, despite improvements.
  • Their “jagged intelligence” makes them brilliant in specific areas but prone to mistakes no human would make (e.g., misunderstanding numerical comparisons like 9.9 < 9.11).
  • “Anterograde amnesia”: inability to retain information beyond their immediate context.
  • Susceptibility to prompt injection attacks due to inherent gullibility.

Effective developers recognize these limitations and strategically engineer around them, rather than expecting flawless, human-like behavior.

Partial-autonomy apps are the way forward

Software should keep a human-in-the-loop, gradually delegating repetitive tasks to AI. Successful examples include:

  • Cursor for coding, offering quick validation through code diffs.
  • Perplexity for search, providing easy verification and transparency.

Today’s software remains mostly human-focused, indicating substantial opportunities to design applications specifically with LLMs in mind.

Developers cheat-sheet: Start with partial autonomy, design tight verification loops, and version-control your prompts.

Karpathy emphasizes patience, recalling his flawless 2013 Waymo self-driving demo. Even in 2025, fully autonomous vehicles still aren’t ubiquitous and often require human oversight. Reliable autonomy grows incrementally—small steps at a time.

Your turn:

What’s one AI demo or idea you’re excited to see become reliably production-ready?

If you enjoyed this article and are exploring the gaps between AI demos and practical products, follow for more insights!