Beyond Confidence: Adaptive and Coherent Decoding for Diffusion Language Models
2025

1 City University of Hong Kong
2 Huawei Research
3 The University of Hong Kong

Abstract

Diffusion Language Models (DLMs) show strong capabilities in any-order generation, but current decoding strategies rely heavily on single-step confidence or entropy. This often results in locally optimal yet globally inconsistent sampling trajectories. We introduce Coherent Contextual Decoding (CCD), a framework that leverages predictive consistency across diffusion steps by approximating a target distribution through context marginalization. We further propose CCD‑DS, an adaptive decoding strategy that dynamically adjusts the unmasking budget based on token-level context sensitivity. Across Dream and LLaDA models, CCD achieves up to 3.48Γ— speedup and 3.91% accuracy improvement, simultaneously enhancing efficiency and generation quality.

Motivation

Current DLM decoding methods select tokens based on single-step confidence, but this approach suffers from:

  • 🚫 Local optimality: Large models are typcally over-confident
  • 🚫 No theoretical interpretation: How to conduct a decoding under a theoretically controllable process?
  • 🚫 Inefficient uniform decoding budgets: Simple predictions can ehance budget?

CCD uses historical context predictions to build a more stable and globally coherent decoding trajectory.

Method

πŸ”₯ Context-Marginalized Target Approximation

CCD forms a more stable predictive distribution by averaging predictions across multiple diffusion steps.

πŸ”₯ Connection to Conditional Mutual Information

We show CCD optimizes an entropy expression involving I(x; c | s), grounding the framework theoretically.

πŸ”₯ CCD‑DS: Adaptive Decoding Budget

Budget increases for easy contexts and decreases for difficult contexts.

Experiments

  • πŸš€ Up to 3.48Γ— speedup
  • πŸš€ Up to +3.91% accuracy improvement
  • πŸš€ Better performance on GSM8K, HumanEval, Trip, etc.

BibTeX

@article{chen2025ccd,
  title={Beyond Confidence: Adaptive and Coherent Decoding for Diffusion Language Models},
  author={Chen, Kecheng and Liu, Ziru and Tao, Xijia and Liu, Hui and Fu, Xinyu and Zhang, Suiyun and Tu, Dandan and Kong, Lingpeng and Liu, Rui and Li, Haoliang},
  journal={arXiv preprint arXiv:2025.xxxxx},
  year={2025}
}