Blog Articles | GLM-Image - AI Image Generation Guides

Mar 7, 2026 12 min read

Qwen3.5-9B: Alibaba's 9B Parameter Model Outperforms 120B Models

Alibaba open-sourced Qwen3.5-9B, achieving 81.7 on GPQA Diamond, surpassing OpenAI's GPT-OSS-120B with 13.5x fewer parameters.

Language Model Small Model Open-Source

Read Full Article

Mar 6, 2026 15 min read

FireRed-OCR 2B: State-of-the-Art Document Parsing, Outperforming 397B Models

Xiaohongshu open-sourced FireRed-OCR, a 2B parameter model achieving 92.94% on OmniDocBench v1.5, surpassing Qwen3.5-397B (90.80%) and Gemini-3.0 Pro (90.33%). Apache 2.0 license for commercial use.

OCR Model Document Parsing Open-Source

Read Full Article

Feb 23, 2026 18 min read

ACE-Step 1.5: The New Open-Source Multimodal Model Breakthrough

Complete guide to ACE-Step 1.5, the open-source multimodal model with 32B parameters, Qwen2.5-32B backbone, and ViT-H/14 vision encoder. Learn about performance benchmarks, hardware requirements, and practical applications.

Multimodal LLM Vision-Language Open-Source

Read Full Article

Feb 22, 2026 25 min read

KANI-TTS-2: The Next Generation Open-Source Text-to-Speech Model

Complete guide to KANI-TTS-2, the open-source TTS model with 12 languages support, 60+ voices, voice cloning, and ultra-low latency. Learn about installation, hardware requirements, and practical applications.

Text-to-Speech Voice Cloning Open-Source

Read Full Article

Feb 21, 2026 20 min read

MOSS-TTS: The Next Generation Open-Source Text-to-Speech Model

Complete guide to MOSS-TTS, the open-source TTS model with multilingual support, voice cloning, and ultra-low latency. Learn about installation, hardware requirements, and practical applications.

Text-to-Speech Voice Cloning Open-Source

Read Full Article

Feb 20, 2026 20 min read

FireRed-Image-Edit-1.0: High-Fidelity Image Editing Model

Complete guide to FireRed-Image-Edit-1.0, the specialized image editing model by FireRedTeam. Learn about high-fidelity editing, restoration, enhancement, and practical implementation.

Image Editing High-Fidelity AI Model

Read Full Article

Feb 19, 2026 12 min read

GLM-5: Zhipu AI's Latest Open-Source Language Model Series

Complete guide to GLM-5, the open-source language model series with 9B parameters and 128K context support. Learn about variants, performance benchmarks, and deployment options.

Language Model Open-Source 128K Context

Read Full Article

Feb 19, 2026 35 min read

Qwen3.5-397B-A17B: The Most Powerful Open-Weight Language Model

Complete guide to Qwen3.5-397B-A17B, the flagship language model with 397B total parameters and 17B active per forward pass. Learn about MoE architecture, state-of-the-art reasoning, and coding capabilities.

Language Model MoE Architecture Open-Weight

Read Full Article

Jan 30, 2026 20 min read

Qwen3-ASR-1.7B: Revolutionary Multilingual Speech Recognition Model

Complete guide to Alibaba's Qwen3-ASR-1.7B automatic speech recognition model. Learn about 52 languages support, state-of-the-art accuracy, hardware requirements, and real-world applications.

Speech Recognition ASR Model Multilingual

Read Full Article

Jan 29, 2026 18 min read

Kimi K2.5: Moonshot AI's Latest Flagship Multimodal Large Language Model

Comprehensive guide to Kimi K2.5, featuring 1.04 trillion parameters, native multimodal capabilities, and Agent Swarm mode for parallel task execution.

Multimodal LLM MoE Architecture Agent Swarm

Read Full Article

Jan 29, 2026 22 min read

Step3-VL-10B: How a 10B Vision-Language Model Rivals Models 10-20x Larger

Comprehensive guide to Step3-VL-10B, featuring PE-lang encoder, exceptional STEM reasoning, and efficient parameter usage.

Vision-Language STEM Reasoning PE-lang Encoder

Read Full Article

Jan 28, 2026 25 min read

Qwen3-TTS: 开源文本转语音模型完整指南

深入了解阿里巴巴 Qwen3-TTS 开源文本转语音模型。涵盖多语言支持、语音克隆、硬件要求和实际应用场景。

Text-to-Speech Voice Cloning Qwen AI

阅读完整文章

Jan 28, 2026 15 min read

How to Use AI Image Upscaler to Enhance Image Quality: Complete Guide 2026

Learn how to use AI image upscaler technology to enhance image quality with super-resolution deep learning. Complete guide covering technical principles, best practices, and practical tips.

Image Enhancement Super-Resolution AI Tools

Read Full Article

Jan 28, 2026 18 min read

How to Use AI Face Swap Technology for Perfect Facial Replacement: Complete Guide 2026

Learn how to use AI face swap technology for facial replacement. Complete guide covering deep learning principles, best practices, and practical tips for creating natural-looking results.

Face Swap Deep Learning Creative Tools

Read Full Article

Jan 28, 2026 16 min read

How to Use AI Image Expander to Extend Image Boundaries: Complete Guide 2026

Learn how to use AI image expander (uncrop) technology to extend image boundaries intelligently. Complete guide covering inpainting, aspect ratios, best practices, and practical tips.

Image Expansion Uncrop Content Generation

Read Full Article

Jan 28, 2026 12 min read

Z-Image: The New Benchmark for Open-Source Image Generation

Z-Image: 6 billion parameter open-source model ranked #1 among open-source models with single-stream diffusion Transformer architecture and exceptional text rendering capabilities.

Image Generation Open-Source Benchmark

Read Full Article

Jan 23, 2026 25 min read

Qwen3-TTS: The Open-Source Text-to-Speech Revolution in 2026

Discover Qwen3-TTS, an open-source text-to-speech model trained on 5M+ hours of speech data across 10 languages with 49 voice timbres and 3-second voice cloning capabilities.

Text-to-Speech Voice Cloning Qwen AI

Read Full Article

Jan 23, 2026 20 min read

Microsoft VibeVoice-ASR: Revolutionary Speech Recognition Model

Discover Microsoft's VibeVoice-ASR, a state-of-the-art speech recognition model that handles 60-minute audio with integrated speaker diarization and timestamping in a single pass.

Speech Recognition ASR Microsoft

Read Full Article

Jan 20, 2026 18 min read

AgentCPM-Explore: First Open-Source 4B Agent Model

Discover AgentCPM-Explore, the first open-source 4B parameter agent model ranking on 8 benchmarks. Learn about its deep exploration capabilities and on-device deployment advantages.

AI Agent 4B Model On-Device AI

Read Full Article

Jan 15, 2026 15 min read

FLUX 2 Klein: The Fastest AI Image Generation Model

Discover FLUX 2 Klein's 9B and 4B parameter models with sub-second inference times and 13GB VRAM requirements. Professional-grade AI image generation on consumer hardware.

FLUX 2 Klein AI Models Performance

Read Full Article

Jan 14, 2026 12 min read

Mastering Text Rendering with GLM-Image: A Complete Guide

Learn how GLM-Image achieves exceptional text rendering accuracy with the Glyph-byT5 encoder. Discover best practices for creating images with precise text integration in multiple languages, especially Chinese characters.

Text Rendering Tutorial Glyph-byT5

Read Full Article

Jan 14, 2026 15 min read

Knowledge-Intensive Image Generation with GLM-Image

Discover how GLM-Image excels at complex instruction following and factual accuracy. Perfect for creating educational content, technical diagrams, and images requiring intricate information representation.

Knowledge-Intensive Use Cases Educational

Read Full Article

Jan 14, 2026 14 min read

Advanced Image Editing Techniques with GLM-Image

Explore GLM-Image's block-causal attention mechanism for precise image editing. Learn techniques for style transfer, identity preservation, and multi-subject consistency in your creative projects.

Image Editing Style Transfer Advanced

Read Full Article

GLM-Image Blog Articles

Qwen3.5-9B: Alibaba's 9B Parameter Model Outperforms 120B Models

FireRed-OCR 2B: State-of-the-Art Document Parsing, Outperforming 397B Models

ACE-Step 1.5: The New Open-Source Multimodal Model Breakthrough

KANI-TTS-2: The Next Generation Open-Source Text-to-Speech Model

MOSS-TTS: The Next Generation Open-Source Text-to-Speech Model

FireRed-Image-Edit-1.0: High-Fidelity Image Editing Model

GLM-5: Zhipu AI's Latest Open-Source Language Model Series

Qwen3.5-397B-A17B: The Most Powerful Open-Weight Language Model

Qwen3-ASR-1.7B: Revolutionary Multilingual Speech Recognition Model

Kimi K2.5: Moonshot AI's Latest Flagship Multimodal Large Language Model

Step3-VL-10B: How a 10B Vision-Language Model Rivals Models 10-20x Larger

Qwen3-TTS: 开源文本转语音模型完整指南

How to Use AI Image Upscaler to Enhance Image Quality: Complete Guide 2026

How to Use AI Face Swap Technology for Perfect Facial Replacement: Complete Guide 2026

How to Use AI Image Expander to Extend Image Boundaries: Complete Guide 2026

Z-Image: The New Benchmark for Open-Source Image Generation

Qwen3-TTS: The Open-Source Text-to-Speech Revolution in 2026

Microsoft VibeVoice-ASR: Revolutionary Speech Recognition Model

AgentCPM-Explore: First Open-Source 4B Agent Model

FLUX 2 Klein: The Fastest AI Image Generation Model

Mastering Text Rendering with GLM-Image: A Complete Guide

Knowledge-Intensive Image Generation with GLM-Image

Advanced Image Editing Techniques with GLM-Image