Conversational agents have reached remarkable advancements with the advent of large language models (LLMs). Despite their impressive capabilities, these models frequently suffer from the issue of hallucinations, where they generate information that is incorrect or not grounded in reality. Moreover, users often tend to over-rely on LLM-based AI agents, accepting the AI's suggestions even when they are erroneous. In the context of task-oriented conversations, such overreliance can lead to incorrect or incomplete task execution, thereby undermining the system's reliability. This work explores accountability modeling to prevent overreliance on task-oriented conversational AI.
Task-oriented dialogue systems (TODS) are designed to assist users in completing a task or goal through conversations. Dialogue state tracking (DST) is a crucial component of TODS that accounts for understanding the user intentions and keeping track of the dialogue state. Task-oriented dialogues are sensitive to DST errors (false positives and false negatives), as a single error can significantly change the course of the conversation. For example, in Figure 1, attraction-area is a false negative prediction. As a result, the user may overrely on the AI's suggestion and end up booking a park that is not near or centre of the town. Such problems deteriorate the user experience in real-world conversations.
In this work, we tackle overreliance using accountability modeling of DST that can detect the dialogue state errors in advance and rectify them. For example, in Figure 1, attraction-area is not in the prediction, the accountability model is able to detect it and self-corrects the predicted dialogue state. Instead of self-correcting the errors, the model can also introduce such friction turns (like confirmation about model uncertainty and errors), which helps to rectify the error in subsequent turns and prevent overreliance.
The main idea of our approach is to add an accountability head in backbone LLMs, which is nothing but a binary classifier for predicting the slots in the dialogue state. The classifier is added to the final token of the dialogue context. The resultant model is jointly trained on the standard language modeling loss and the auxiliary slot classification loss. The accountability head helps to estimate the slot probabilities for all the slots, which can be used to detect the false positive and false negative slots in the predicted dialogue state. Furthermore, the inclusion of the accountability head acts as an auxiliary loss that helps in the learning of dialogue state generation.
The slot probabilities output by the classifier or the accountability head can be used to self-correct (SC) the generated dialogue state. We propose a two-step dialogue state correction algorithm. The first step of the algorithm attempts to filter the possible false positives, while the second step helps to include the possible false negatives.
We perform our experiments with three backbone LLMs (Llama, Mistral, Gemma) on two established task-oriented datasets (MultiWOZ and Snips). Our empirical findings (shown in Figure 3) demonstrate that this approach not only enables reliable estimation of AI agent errors but also guides the LLM decoder in generating more accurate actions. We observe around 3% absolute improvement in joint goal accuracy by incorporating accountability heads in modern LLMs for the MultiWOZ dataset. We also show that this method enables the agent to self-correct its actions which further increases the JGA from 67.13 to 70.51, achieving state-of-the-art DST performance.
Figure 5 shows illustrative examples of the model prediction from the MultiWOZ and Snips datasets. In the first example, the model detected a false positive slot (restaurant-pricerange) and filtered it to rectify the prediction. The second example contains a false negative slot (attraction-type), which is corrected successfully. In the third example, the model detects both false positive and false negative slots and successfully rectifies them. In the fourth example, the original prediction of the model is correct. However, self-correction adds an extra slot (train-departure), which makes the prediction wrong. The fifth one shows an instance where the algorithm partially corrects an error.
We study the application of our accountability modeling to prevent user overreliance on real-world task-oriented conversation. The approach is based on introducing positive friction like user confirmations that can eventually lead to successful task completion. In this experiment, rather than automatically correcting the dialogue state, the system inserts friction turns to request clarification on uncertain slots directly from the user. We conduct this experiment using a user simulator (GPT-4o) where the user simulator is asked to confirm detected false positives and false negatives. The method achieves comparable DST performance compared to the self-correction method as shown in Figure 6.
@misc{dey2025knowmistakespreventingoverreliance,
title={Know Your Mistakes: Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling},
author={Suvodip Dey and Yi-Jyun Sun and Gokhan Tur and Dilek Hakkani-Tur},
year={2025},
eprint={2501.10316},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.10316},}
This work was supported in part by Other Transaction award HR0011249XXX from the U.S. Defense Advanced Research Projects Agency (DARPA) Friction for Accountability in Conversational Transactions (FACT) program and has benefited from the Microsoft Accelerate Foundation Models Research (AFMR) grant program, through which leading foundation models hosted by Microsoft Azure and access to Azure credits were provided to conduct the research.