Microsoft Launches Three MAI Models, Amazon Bets $35B on AGI: The April 2 AI Briefing
Microsoft dropped three MAI models in 24 hours — transcription, voice, and image — each competing directly with OpenAI’s own offerings. Simultaneously, Amazon’s $35B OpenAI commitment put the definition of AGI at the centre of the largest financial contract in AI history. And the AGI definition debate is no longer philosophical: it’s contractual. Here’s what this actually means.
In This Article
01MAI-Transcribe-1: 2.5x Faster, $0.36/Hour
02MAI-Voice-1: 60-Second Audio in 1 Second, $22/Million Characters
03MAI-Image-2: Microsoft Enters the Image Generation Race
04Amazon’s $35B AGI Milestone Commitment to OpenAI
05The AGI Definition War: Huang vs DeepMind vs Hendrycks/Bengio
06Microsoft MAI Model Suite — April 2, 2026
07Frequently Asked Questions
Table of Contents
- MAI-Transcribe-1: 2.5x Faster, $0.36/Hour
- MAI-Voice-1: 60-Second Audio in 1 Second, $22/Million Characters
- MAI-Image-2: Microsoft Enters the Image Generation Race
- Amazon’s $35B AGI Milestone Commitment to OpenAI
- The AGI Definition War: Huang vs DeepMind vs Hendrycks/Bengio
- Microsoft MAI Model Suite — April 2, 2026
- Frequently Asked Questions
MAI-Transcribe-1: 2.5x Faster, $0.36/Hour

Microsoft’s AI division launched MAI-Transcribe-1, a speech-to-text model priced at $0.36 per hour of audio and running at 2.5x the speed of the current Whisper-based transcription tier on Azure.
The pricing positions MAI-Transcribe-1 directly against OpenAI’s Whisper API, which sits at approximately $0.36 per hour — identical pricing, but Microsoft is claiming a significant speed advantage. For enterprises processing large volumes of audio — call centres, media companies, legal transcription services — a 2.5x speed uplift at the same cost changes the unit economics materially. Microsoft Azure blog on MAI-Transcribe-1 launch.
Microsoft’s MAI models go head-to-head with OpenAI’s offerings — the same company Microsoft has invested $13B into and distributes through Azure OpenAI Service. This is not an accident. Microsoft is hedging: build your own AI stack so that if the OpenAI relationship changes, you still have a world-class product. Here’s what this actually means: Microsoft is not just an OpenAI distributor. It’s becoming an AI competitor.
MAI-Voice-1: 60-Second Audio in 1 Second, $22/Million Characters
MAI-Voice-1 is Microsoft’s new text-to-speech model, capable of synthesising 60 seconds of audio in approximately 1 second, at a price of $22 per million characters.
That latency figure — near-real-time synthesis for a minute of audio — is a step change from prior generations. Real-time voice agents, interactive audio applications, and content dubbing pipelines all benefit from sub-second synthesis. The $22/million character pricing is competitive with ElevenLabs and OpenAI TTS. Microsoft’s distribution advantage through Azure gives it immediate access to enterprise customers already running workloads on its cloud. TechCrunch on MAI-Voice-1.
MAI-Image-2: Microsoft Enters the Image Generation Race

Microsoft also launched MAI-Image-2, its second-generation AI image generation model available via Azure AI Services. Full pricing and capability benchmarks were not disclosed at launch, but Microsoft confirmed MAI-Image-2 targets enterprise use cases — product imagery, document illustration, internal creative workflows — rather than consumer generation. It competes with DALL-E 3, Stable Diffusion, and Midjourney at the enterprise API tier. Azure MAI-Image-2 product page.
Amazon’s $35B AGI Milestone Commitment to OpenAI

Amazon’s investment in OpenAI — reported as $35 billion contingent on AGI milestones — is the most consequential financial-contractual AGI definition in history. It is not a grant or a donation. It is a capital commitment that will only deploy if OpenAI meets technical thresholds the two companies have privately agreed constitute “AGI.”
The contractual implications are profound. If OpenAI achieves what Amazon considers AGI, $35B unlocks. If there’s a dispute about whether those milestones were met, it goes to arbitration — or litigation. The private definitions in that contract will likely become the most litigated technical document in history. WSJ’s reporting on the Amazon-OpenAI AGI clause.
For years, AGI was a philosophical debate. Now it’s a contractual trigger attached to $35B. The private definition in the Amazon-OpenAI contract will become the most commercially consequential benchmark in AI. Whoever controls the definition controls the capital deployment — and potentially the narrative of who “won.”
The AGI Definition War: Huang vs DeepMind vs Hendrycks/Bengio
Three major frameworks are competing to define AGI in 2026:
Jensen Huang (Capitalistic): AGI is when AI can do the work of any human in a company — effectively an economic productivity threshold. Measurable, but risks conflating task performance with general intelligence.
DeepMind (Academic): A tiered framework ranging from “Emerging AGI” to “Superhuman AGI,” based on capability breadth and depth across cognitive tasks. More rigorous but harder to operationalise contractually.
Hendrycks/Bengio (Safety-Aligned): AGI requires both broad general capability and alignment with human values — a two-dimensional threshold that explicitly rules out a system that is capable but unsafe. This framework is increasingly influential in policy circles. DeepMind’s AGI levels framework paper.
Jensen’s capitalistic AGI definition would likely already be satisfied by current models for many enterprise tasks. DeepMind’s tiered model pushes the goalpost further. Hendrycks/Bengio’s safety requirement may never be formally met if alignment remains unsolved. Each definition advantages different actors — and that’s not a coincidence.
Microsoft MAI Model Suite — April 2, 2026
| Model | Function | Pricing | Key Spec |
|---|---|---|---|
| MAI-Transcribe-1 | Speech-to-text | $0.36/hour | 2.5x faster than Whisper |
| MAI-Voice-1 | Text-to-speech | $22/M chars | 60s audio in 1s |
| MAI-Image-2 | Image generation | TBA (enterprise) | Competes with DALL-E 3 |
Frequently Asked Questions
Yes and no. Microsoft distributes OpenAI models through Azure and has invested $13B+ in OpenAI. But the MAI model suite directly competes with OpenAI’s Whisper, TTS, and DALL-E offerings. Microsoft is building a hedge — its own AI stack in case the relationship changes or costs need to be controlled.
Same price ($0.36/hour) but Microsoft claims 2.5x faster processing speed. For high-volume enterprise audio workloads — call centres, media, legal — a 2.5x throughput improvement significantly changes processing timelines and infrastructure costs.
Amazon’s $35B commitment to OpenAI is reportedly contingent on OpenAI reaching AGI milestones defined in a private contract. If those milestones are met, $35B deploys. If disputed, the clause would go to arbitration or litigation. The precise definition of AGI in that contract is not publicly known.
There is no consensus. Jensen Huang’s economic productivity definition is the most concrete and measurable. DeepMind’s tiered capability framework is the most academically rigorous. The Hendrycks/Bengio safety-aligned definition is gaining traction in policy circles as it integrates alignment as a precondition for true AGI.
MAI-Image-2 is available through Azure AI Services, targeting enterprise API customers. Consumer access through Microsoft Copilot or Bing Image Creator has not been announced. Enterprise pricing was not disclosed at launch.
Every week, the biggest AI stories distilled into clear analysis you can act on.