Table of Contents

On April 24, 2026 — almost exactly one year after DeepSeek-R1 shook Silicon Valley — DeepSeek released its V4 series. Bigger model, longer context, more aggressive pricing, and one technically significant detail that most coverage buried: V4 is DeepSeek’s first model optimized for Huawei Ascend chips.

TL;DR

DeepSeek V4 ships in two variants: V4 Flash (lightweight, high-speed) and V4 Pro (flagship). Pro is currently the largest open-weight mixture-of-experts model available, with 1.6T total parameters (49B active) and 1M token context. Pricing: V4 Flash at $0.14/M input tokens, $0.28/M output — undercutting GPT-5.4 Nano, Gemini 3.1 Flash, and Claude Haiku 4.5. Performance: V4-Pro-Max claims to outperform GPT-5.2 and Gemini 3.0 Pro on reasoning benchmarks.

What It Is

DeepSeek V4 is a Mixture of Experts (MoE) architecture. The core idea: despite a massive total parameter count, only a subset of “expert” sub-networks activates during each inference pass. This keeps compute costs significantly lower than a comparable dense model.

VariantTotal ParamsActive ParamsContext WindowPricing (in/out)
V4 FlashUndisclosedUndisclosed1M tokens$0.14 / $0.28 per M tokens
V4 Pro1.6T49B1M tokensUndisclosed
V4-Pro-Max1.6T49B1M tokensUndisclosed

A 1M token context window means you can fit entire large codebases or lengthy documents into a single prompt — particularly useful for cross-file code understanding tasks.

Why It Matters

Open-source pricing pressure continues

V4 Flash undercuts GPT-5.4 Nano, Gemini 3.1 Flash, GPT-5.4 Mini, and Claude Haiku 4.5 on price. This is DeepSeek’s consistent strategy: use aggressive pricing to pressure OpenAI and Google, while the open-source version lets the community self-host, further compressing competitors’ commercial headroom.

Performance claims

DeepSeek claims V4-Pro-Max outperforms GPT-5.2 and Gemini 3.0 Pro on reasoning benchmarks and leads in coding evaluations. Note: these numbers are currently primarily from DeepSeek’s own evaluations. Independent third-party benchmarks are in progress. Historical context: DeepSeek-R1’s claimed performance was largely validated by the community, so V4’s numbers deserve serious attention — while awaiting independent confirmation.

Huawei Ascend optimization: the geopolitical technical marker

This is the most strategically significant detail in the release, and the most overlooked.

US export controls on Nvidia GPUs have forced Chinese AI companies to seek alternative hardware. V4 is DeepSeek’s first model formally optimized for Huawei Ascend chips. If this optimization’s real-world performance matches the claimed figures, it represents a meaningful milestone: China’s top AI companies can train and deploy frontier models without depending on Nvidia A100/H100/H200.

How It Works

V4’s MoE architecture builds on V3 with several key improvements:

Extended context handling: V4 uses a new design for efficiently processing long sequences, addressing the memory explosion problem that affects Transformer models with very long contexts. Technical specifics haven’t been fully disclosed.

Enhanced reasoning: V4 was trained with reinforced chain-of-thought supervision, producing significant improvements on math and complex reasoning tasks.

Agentic tasks: V4 has specific training for tool use and multi-step task planning, making it well-suited as a backbone for AI agent systems.

Comparison

DimensionDeepSeek V4-ProGPT-5.2Gemini 3.0 Pro
Open sourceYesNoNo
Total params1.6T (MoE)UndisclosedUndisclosed
Context window1M tokensUndisclosedUndisclosed
Self-hostableYesNoNo
Ascend supportYesNoNo

The largest differentiator remains the open source + low pricing combination: enterprises can download V4 and self-host without any API dependency, ensuring data never leaves their infrastructure. For data-sensitive applications (finance, healthcare, legal), this attribute is difficult to quantify in benchmark terms.

Bottom Line

DeepSeek V4 continues to challenge the assumption that open-source models must lag behind closed-source frontiers. Regardless of final benchmark results, its release creates immediate pricing pressure across the AI market and technically validates China’s ability to advance frontier AI without Nvidia’s latest GPUs.

For developers and technical decision-makers, the relevant question now is: in your specific use case, does V4 Flash’s pricing and performance combination already replace the closed-source API you’re currently using?

References

Tags

Related Articles