Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling

Conversational AI Lab
ConvAI Lab, University of Illinois at Urbana-Champaign

Overreliance on Task-Oriented Conversational AI

Conversational agents have reached remarkable advancements with the advent of large language models (LLMs). Despite their impressive capabilities, these models frequently suffer from the issue of hallucinations, where they generate information that is incorrect or not grounded in reality. Moreover, users often tend to over-rely on LLM-based AI agents, accepting the AI's suggestions even when they are erroneous. In the context of task-oriented conversations, such overreliance can lead to incorrect or incomplete task execution, thereby undermining the system's reliability. This work explores accountability modeling to prevent overreliance on task-oriented conversational AI.

How Accountability Modeling can prevent Overreliance?

Task-oriented dialogue systems (TODS) are designed to assist users in completing a task or goal through conversations. Dialogue state tracking (DST) is a crucial component of TODS that accounts for understanding the user intentions and keeping track of the dialogue state. Task-oriented dialogues are sensitive to DST errors (false positives and false negatives), as a single error can significantly change the course of the conversation. For example, in Figure 1, attraction-area is a false negative prediction. As a result, the user may overrely on the AI's suggestion and end up booking a park that is not near or centre of the town. Such problems deteriorate the user experience in real-world conversations.


MY ALT TEXT
Figure 1: Overview of Accountability Modeling.

In this work, we tackle overreliance using accountability modeling of DST that can detect the dialogue state errors in advance and rectify them. For example, in Figure 1, attraction-area is not in the prediction, the accountability model is able to detect it and self-corrects the predicted dialogue state. Instead of self-correcting the errors, the model can also introduce such friction turns (like confirmation about model uncertainty and errors), which helps to rectify the error in subsequent turns and prevent overreliance.

Model Architecture

The main idea of our approach is to add an accountability head in backbone LLMs, which is nothing but a binary classifier for predicting the slots in the dialogue state. The classifier is added to the final token of the dialogue context. The resultant model is jointly trained on the standard language modeling loss and the auxiliary slot classification loss. The accountability head helps to estimate the slot probabilities for all the slots, which can be used to detect the false positive and false negative slots in the predicted dialogue state. Furthermore, the inclusion of the accountability head acts as an auxiliary loss that helps in the learning of dialogue state generation.


MY ALT TEXT
Figure 2: Model architecture of the LLM-based generative accountability modeling for DST.

Performance

We perform our experiments with three backbone LLMs (Llama, Mistral, Gemma) on two established task-oriented datasets (MultiWOZ and Snips). Our empirical findings (shown in Figure 3) demonstrate that this approach not only enables reliable estimation of AI agent errors but also guides the LLM decoder in generating more accurate actions. We observe around 3% absolute improvement in joint goal accuracy by incorporating accountability heads in modern LLMs for the MultiWOZ dataset. We also show that this method enables the agent to self-correct its actions, further boosting its performance by 3%.


MY ALT TEXT
Figure 3: Comparison of the DST performance on the test subsets of the MultiWOZ and Snips datasets.

Dialogue State Correction using Accountability Modeling

Figure 4 shows illustrative examples of the model prediction from the MultiWOZ and Snips datasets. In the first example, the model detected a false positive slot (restaurant-pricerange) and filtered it to rectify the prediction. The second example contains a false negative slot (attraction-type), which is corrected successfully. In the third example, the model detects both false positive and false negative slots and successfully rectifies them. In the fourth example, the original prediction of the model is correct. However, self-correction adds an extra slot (train-departure), which makes the prediction wrong. The fifth one shows an instance where the algorithm partially corrects an error. The last two examples are from the Snips, where the algorithm successfully rectifies the prediction.


MY ALT TEXT
Figure 4: Illustrative example of dialogue state corrections.

BibTeX

@misc{dey2025preventingoverreliancetaskorientedconversational,
        title={Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling}, 
        author={Suvodip Dey and Yi-Jyun Sun and Gokhan Tur and Dilek Hakkani-Tur},
        year={2025},
        eprint={2501.10316},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        url={https://arxiv.org/abs/2501.10316},
      }
      

Acknowledgements

This work was supported in part by Other Transaction award HR0011249XXX from the U.S. Defense Advanced Research Projects Agency (DARPA) Friction for Accountability in Conversational Transactions (FACT) program and has benefited from the Microsoft Accelerate Foundation Models Research (AFMR) grant program, through which leading foundation models hosted by Microsoft Azure and access to Azure credits were provided to conduct the research.