2021.06.12 更新：本文已停止更新，大部分内容已经总结至《近年对话状态追踪综述：进展与挑战》。

任务完成型对话系统论文调研（一）：本文是系列文章的第一篇，一开始架构文章的时候没有考虑太多，所以到目前为止只是写了一些基础的内容，并且逻辑有点乱。
任务完成型对话系统论文调研（二）：《一》中大部分内容已总结至此文。

引言

任务完成型对话系统中的挑战比起之前已经有所改变。在 (Xu and Hu 2018) 提出使用 Ptr 缓解低频槽值提取的问题之后，两年之内涌现了许多有突破性的论文。。。

DST

contextutal slot representations

SAS: slot attention

Model	encoder	method	history aux input	extra	slot rel	level	1-step
SOM-DST	BERT+GRU	gen	prev turn(256) ds	-	-	turn	N
CSFN-DST	BERT(CSFN)	gen	no schema graph	-	schema graph	turn	N
Quan (2020)	BiGRU	gen				dialog	N
SAS	BiGRU	gen	recent \(t\) turns	slot type	slot sim mat		N
CHAN-DST	BERT	disc	all(64 per turn)	val acc	-	dialog	N/A
DST-SC	BiGRU	gen	recent \(m\) words	-	-		N
TEN
PIN
Graph-DST	BERT	gen	pre turn ds+graph	-	ds graph	turn	N
STN4DST	BERT	STN	no ds + appd value		-	turn	STN
SAVN	BERT	span+VN	recent \(t\) turns special words	special words full ontology	-	turn	N
S2S-DU	BERT	gen	no schema(ISV)		-	turn	ptr
STAR	BERT	disc	full(512) non-none ds	-	self-attn		N/A
Zhang (2021)	BERT	hybrid

本文结构与上一篇不同。删除了《独立模型与联合模型》。删除了《难点/未来的工作》一节，这是因为其中大部分难点都或多或少得到了缓解，本文将其替换为了《各项难点的处理方式》。增加了《前情提要》一节，回顾了 2020 年之前 DST 模型的做法。其余章节不做变化。

前情提要

本节回顾一下 2020 年之前 DST 的做法。我暂时没有阅读过远古时代如何构建 DST 模型的论文，所以对那时的做法不做分析或评价。不过据我所知，在以前都需要一个人工构建的同义词表，以此替换用户语句中的一些同义词，例如“center”和“centre”都是“中心”的意思。然而这种构建同义词表的做法显然费时费力。

同义词表以及误差传递的问题

在描述如何解决标题中的难点之前，首先需要介绍另一个要点。TODS 一般由六大模块组成，每一模块的输入是前一模块的输出，这就导致了一个严重的问题——层级误差传递。由于神经网络或者其他的技术都无法做到百分之百的准确率，所以每一个模块的误差会被一次一次地放大。

为解决这一问题，Henderson, Thomson, and Young (2014) 舍弃了 SLU 模块，直接使用 ASR 识别出的结果作为 DST 的输入，注意他们并没有使用词向量。此外，他们沿用了他们同年发表的论文中的方法——delexicalised features。简单来说就是将“槽”和“值”替换为一种标签，例如“i want chinese food”->“i want <value> <slot>”。这样做虽然可以提高模型的泛化性，但是还是需要一个人工构建的同义词表。联合模型

2013 年下半年词向量横空出世。此后，DST 领域也开始借助词向量的优势。Zilka and Jurcicek (2015) 额外使用了 ASR 输出的置信度分数，具体来说就是将置信度分数乘上词向量从而得到一个全新的词向量，并将其作为模型的输入（具体做法还可以改进，文中只说了将分数和词向量结合），从而作为 LSTM 的输入。然后在每一个时间步上都接 N 个线性分类层，用于在所有候选槽值上进行分类，N 代表槽位的数量。博主注：说实话这中分类方式挺诡异的，因为每个时间步都要执行一次分类。但是不管怎么样，这种方式借助词向量的优势已经不需要同义词表了。 不再需要使用同义词表

ontology-based approach

此后，Mrkšić et al. (2017) 提出了 neural belief tracker（NBT）。与(Zilka and Jurcicek 2015)类似，他们也使用了词向量，不过他们没有使用已有的词向量，而是提出两种方法自行预训练。此外最重要的是他们提出了判别式 DST，即使用槽值对的表征去与用户语句做判别，以此判断该槽值对是否被用户提及，如果是则更新对话状态，反之亦反。但是这显然包含着重大的隐患，就是有些槽值对是不可枚举的，这该如何去匹配？所以之后的模型大都围绕如何处理未知槽值的问题。判别式 DST

Rastogi, Hakkani-Tür, and Heck (2017) 提出了一个多领域 DST 模型，但是它是一个独立模型，这是因为它需要 SLU 模块帮助其实现 delexicalisation。他们将槽值候选集加了一项限定，即每一个槽值都会有一项分数，通过分数的排名，将候选集限定在一定的范围内，实验中选择 7 最为最大值。候选集的生成方式有多种，文中使用了外部的知识源而不是本体（因为本体通常很难构建以及访问。而知识源则很简答，可能还描述了其他方式去构建，但是时间太久了，有点忘了）。有限候选集多领域 DST

Zhong, Xiong, and Socher (2018) 使用 global-locally 自注意力机制改进了对低频槽值对的追踪。

多领域DST和Slot Gate

本节不对模型进行详细描述，主要介绍多领域 DST 和 slot gate。

在 Budzianowski et al. (2018) 发布 MultiWOZ 多领域任务完成型对话数据集之后，涌现了一大批多领域 DST 模型。有一点需要强调，由于已经步入多领域 DST 的时代，因此槽值对 (slot, value) 的形式显然已经不具有足够的能力表示对话状态，多领域 DST 时代大都使用三元组 (domain, slot, value) 的形式来表示，而槽位一般也写作 doamin-slot，可以读作“域槽对”。其实在这之前已经有了多领域 DST，具体来说，他们融合多个单领域数据集，然后在该数据集上进行实验。

Wu et al. (2019) 提出了 TRADE 模型，使用 PtrNet 和 zero-shot 的训练方式使模型有能力处理未知领域中的槽位。他们将槽位分为 \(\{none, dontcare, ptr\}\) 三种。none 代表当前所判别的槽位不包含槽值，槽值就是 none；dontcare 代表用户不关心当前所判别的槽位的具体槽值是什么，槽值就是 dontcare；ptr 代表模型需要再额外使用一个 ptr 神经网络来从用户语句中直接提取槽值。具体做法是，模型使用域槽对去和用户语句做判别，得到槽位的类别，然后根据槽位的类别进一步得到槽值。

Zhang et al. (2020) 将槽位分为 {none, dontcare, picklist-based, span-based}，其中前两个和最后一个与 TRADE 中的定义一一对应。他们认为有些槽位是没必要使用 ptr 从用户语句中提取的，例如对于槽位“hotel-pricerange”，只有三个槽值 {cheap, moderate, expensive}。(Heck et al. 2020) 将槽位分为 {none, dontcare, span-based, refer, inform}, {none, dontcare, true, false}。

生成式DST

在 2015-2017 之间，有人提出了 Pointer Network（PtrNet），并且后续有许多研究人员对其进一步完善。(Xu and Hu 2018) 首次在 DST 中引入了 index-based Ptr，此后有许多模型都使用了这一思想。span-based

TRADE 模型(Wu et al. 2019)属于生成式 DST。

各项难点的处理方式

未知的槽位：任务导向对话模式、zero-shot learning
未知以及不可枚举的槽值：span-based
未知的意图：任务导向对话模式
变化的系统动作：目前不知
数据稀缺：MultiWOZ，TaskMaster，CrossWOZ，SGD
计算复杂度：目前不知

结果对比

参考文献

Budzianowski, Pawel, Tsung-Hsien Wen, Bo-Hsiang Tseng, Inigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gašić. 2018. “Multiwoz-a Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling.” arXiv Preprint arXiv:1810.00278.

Heck, Michael, Carel van Niekerk, Nurul Lubis, Christian Geishauser, Hsien-Chin Lin, Marco Moresi, and Milica Gasic. 2020. “TripPy: A Triple Copy Strategy for Value Independent Neural Dialog State Tracking.” In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 35–44. 1st virtual meeting: Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.sigdial-1.4.

Henderson, Matthew, Blaise Thomson, and Steve Young. 2014. “Word-Based Dialog State Tracking with Recurrent Neural Networks.” In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (Sigdial), 292–99.

Hu, Jiaying, Yan Yang, Chencai Chen, Liang He, and Zhou Yu. 2020. “SAS: Dialogue State Tracking via Slot Attention and Slot Information Sharing.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 6366–75. Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.567.

Kim, Sungdong, Sohee Yang, Gyuwan Kim, and Sang-Woo Lee. 2020. “Efficient Dialogue State Tracking by Selectively Overwriting Memory.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 567–82. Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.53.

Mrkšić, Nikola, Diarmuid Ó Séaghdha, Tsung-Hsien Wen, Blaise Thomson, and Steve Young. 2017. “Neural Belief Tracker: Data-Driven Dialogue State Tracking.” In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1777–88.

Ouyang, Yawen, Moxin Chen, Xinyu Dai, Yinggong Zhao, Shujian Huang, and Jiajun Chen. 2020. “Dialogue State Tracking with Explicit Slot Connection Modeling.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 34–40. Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.5.

Quan, Jun, and Deyi Xiong. 2020. “Modeling Long Context for Task-Oriented Dialogue State Generation.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7119–24. Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.637.

Rastogi, Abhinav, Dilek Hakkani-Tür, and Larry Heck. 2017. “Scalable Multi-Domain Dialogue State Tracking.” In 2017 Ieee Automatic Speech Recognition and Understanding Workshop (Asru), 561–68. IEEE. https://doi.org/10.1109/ASRU.2017.8268986.

Shan, Yong, Zekang Li, Jinchao Zhang, Fandong Meng, Yang Feng, Cheng Niu, and Jie Zhou. 2020. “A Contextual Hierarchical Attention Network with Adaptive Objective for Dialogue State Tracking.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 6322–33. Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.563.

Wang, Yexiang, Yi Guo, and Siqi Zhu. 2020. “Slot Attention with Value Normalization for Multi-Domain Dialogue State Tracking.” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (Emnlp), 3019–28. Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.243.

Wu, Chien-Sheng, Andrea Madotto, Ehsan Hosseini-Asl, Caiming Xiong, Richard Socher, and Pascale Fung. 2019. “Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems.” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 808–19. Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1078.

Xu, Puyang, and Qi Hu. 2018. “An End-to-End Approach for Handling Unknown Slot Values in Dialogue State Tracking.” In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1448–57. Melbourne, Australia: Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-1134.

Ye, Fanghua, Jarana Manotumruksa, Qiang Zhang, Shenghui Li, and Emine Yilmaz. 2021. “Slot Self-Attentive Dialogue State Tracking.” In Proceedings of the Web Conference 2021, 1598–1608. WWW ’21. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3442381.3449939.

Zeng, Yan, and Jian-Yun Nie. 2020. “Multi-Domain Dialogue State Tracking Based on State Graph.” arXiv Preprint arXiv:2010.11137.

Zhang, Jianguo, Kazuma Hashimoto, Chien-Sheng Wu, Yao Wang, Philip Yu, Richard Socher, and Caiming Xiong. 2020. “Find or Classify? Dual Strategy for Slot-Value Predictions on Multi-Domain Dialog State Tracking.” In Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics, 154–67. Barcelona, Spain (Online): Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.starsem-1.17.

Zhong, Victor, Caiming Xiong, and Richard Socher. 2018. “Global-Locally Self-Attentive Dialogue State Tracker.” arXiv Preprint arXiv:1805.09655.

Zhu, Su, Jieyu Li, Lu Chen, and Kai Yu. 2020. “Efficient Context and Schema Fusion Networks for Multi-Domain Dialogue State Tracking.” In Findings of the Association for Computational Linguistics: EMNLP 2020, 766–81. Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.68.

Zilka, Lukas, and Filip Jurcicek. 2015. “Incremental Lstm-Based Dialog State Tracker.” In 2015 Ieee Workshop on Automatic Speech Recognition and Understanding (Asru), 757–62. https://doi.org/10.1109/ASRU.2015.7404864.

博客

任务完成型对话系统论文调研（二）

引言

DST

相关工作

long context modeling

few-shot learning