A family of SOTA speech models (0.6B & 1.7B) supporting 10 languages. Features prompt-based Voice Design, 3s zero-shot cloning, and extreme low-latency streaming.
Qwen-Image-2512 is the new open-source SOTA for text-to-image generation. It delivers drastically improved photorealism, finer natural details, and superior text rendering.
Qwen-Image-Layered decomposes images into transparent RGBA layers, unlocking inherent editability. You can move, resize, or delete objects without artifacts. Supports recursive decomposition and variable layer counts.
Qwen3-Coder is a new 480B MoE open model (35B active) by the Qwen team, built for agentic coding. It achieves SOTA results on benchmarks like SWE-bench, supports up to 1M context, and comes with an open-source CLI tool, Qwen Code.
Qwen3-235B-A22B-Thinking-2507 is a powerful open-source MoE model (22B active) built for deep reasoning. It achieves SOTA results on agentic tasks, supports a 256K context, and is available on Hugging Face and via API.
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Qwen-Image is a new 20B open-source image foundation model by the Qwen team. It excels at complex text rendering (especially Chinese) and precise image editing, while also delivering strong general image generation. Available now in Qwen Chat.
Qwen-Image-Edit is the editing version of the 20B Qwen-Image model. It offers precise, model-native editing, including bilingual text modification and both high-level semantic and low-level appearance changes.
Qwen3-VL is the new flagship vision-language model from the Qwen team, excelling at visual agent tasks, long-video understanding, and spatial reasoning with a native 256K context window.