Publications

(* indicates equal contribution)

  1. MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?
    Muntasir Wahed, Xiaona Zhou, Kiet A. Nguyen, Tianjiao Yu, Nirav Diwan, Gang Wang, Dilek Hakkani-Tür, Ismini Lourentzou
    Preprint. 2025.

  2. DocCHA: Towards LLM-Augmented Interactive Online diagnosis System
    Xinyi Liu, Dachun Sun, Yi R. Fung, Dilek Hakkani-Tür, Tarek Abdelzaher
    Preprint. 2025.

  3. Language Specific Knowledge: Do Models Know Better in X than in English?
    Ishika Agarwal*, Nimet Beyza Bozdag*, Dilek Hakkani-Tür
    Preprint. 2025.

  4. Must Read: A Systematic Survey of Computational Persuasion
    Nimet Beyza Bozdag, Shuhaib Mehri, Xiaocheng Yang, Hyeonjeong Ha, Zirui Cheng, Esin Durmus, Jiaxuan You, Heng Ji, Gokhan Tur, Dilek Hakkani-Tür
    Preprint. 2025.

  5. PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents
    Takyoung Kim*, Janvijay Singh*, Shuhaib Mehri*, Emre Can Acikgoz, Sagnik Mukherjee, Nimet Beyza Bozdag, Sumuk Shashidhar, Gokhan Tur, Dilek Hakkani-Tür
    Preprint. 2025.

  6. Spark: A System for Scientifically Creative Idea Generation
    Aishik Sanyal, Samuel Schapiro, Sumuk Shashidhar, Royce Moon, Lav R. Varshney, Dilek Hakkani-Tür
    Preprint. 2025.

  7. A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions
    Emre Can Acikgoz*, Cheng Qian*, Hongru Wang*, Vardhan Dongre, Xiusi Chen, Heng Ji, Dilek Hakkani-Tür, Gokhan Tur
    Preprint. 2025.

  8. Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models
    Nimet Beyza Bozdag, Shuhaib Mehri, Gokhan Tur, Dilek Hakkani-Tür
    Preprint. 2025.

  9. LLMs are Vulnerable to Malicious Prompts Disguised as Scientific Language
    Yubin Ge*, Neeraja Kirtane*, Hao Peng, Dilek Hakkani-Tür
    Preprint. 2025.

  10. Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis
    Shuhaib Mehri, Xiusi Chen, Heng Ji, Dilek Hakkani-Tür
    Preprint. 2025.

  11. MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations
    Vardhan Dongre*, Chi Gui*, Shubham Garg, Hooshang Nayyeri, Gokhan Tur, Dilek Hakkani-Tür, Vikram S. Adve
    Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track. 2025.

  12. ToolRL: Reward is All Tool Learning Needs
    Cheng Qian, Emre Can Acikgoz, Qi He, Hongru Wang, Xiusi Chen, Dilek Hakkani-Tür, Gokhan Tur, Heng Ji
    Neural Information Processing Systems (NeurIPS). 2025.

  13. Neural Networks for Learnable and Scalable Influence Estimation of Instruction Fine-Tuning Data
    Ishika Agarwal, Dilek Hakkani-Tür
    Neural Information Processing Systems (NeurIPS). 2025.

  14. Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
    Sagnik Mukherjee, Lifan Yuan, Dilek Hakkani-Tür, Hao Peng
    Neural Information Processing Systems (NeurIPS). 2025.

  15. Goal Alignment in LLM-Based User Simulators for Conversational AI
    Shuhaib Mehri, Xiaocheng Yang, Takyoung Kim, Gokhan Tur, Shikib Mehri, Dilek Hakkani-Tür
    Transactions of the Association for Computational Linguistics (TACL). 2025.

  16. Question Generation for Assessing Early Literacy Reading Comprehension
    Xiaocheng Yang, Sumuk Shashidhar, Dilek Hakkani-Tür
    Workshop on Speech and Language Technology in Education (SLaTE). 2025

  17. YourBench: Easy Custom Evaluation Sets for Everyone
    Sumuk Shashidhar, Clémentine Fourrier, Alina Lozovskia, Thomas Wolf, Gokhan Tur, Dilek Hakkani-Tür
    Conference on Language Modeling (COLM). 2025.

  18. TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons
    Emre Can Acikgoz*, Carl Guo*, Suvodip Dey*, Akul Datta, Takyoung Kim, Gokhan Tur, Dilek Hakkani-Tür
    Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). 2025.

  19. Uncovering Cross-Domain Recommendation Ability of Large Language Models
    Xinyi Liu, Ruijie Wang, Dachun Sun, Dilek Hakkani-Tür, Tarek Abdelzaher
    Companion Proceedings of the ACM on Web Conference (WWW) 2025. 2025.

  20. Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions
    Jihyoung Jang*, Minwook Bae*, Minji Kim, Dilek Hakkani-Tür, Hyounghun Kim
    The Annual Meeting of the Association for Computational Linguistics (ACL). 2025.

  21. Can a Single Model Master Both Multi-turn Conversations and Tool Use? CoALM: A Unified Conversational Agentic Language Model
    Emre Can Acikgoz, Jeremiah Greer, Akul Datta, Ze Yang, William Zeng, Oussama Elachqar, Emmanouil Koukoumidis, Dilek Hakkani-Tür, Gokhan Tur
    The Annual Meeting of the Association for Computational Linguistics (ACL). 2025.

  22. Know Your Mistakes: Towards Preventing Overreliance on Task-Oriented Conversational AI Through Accountability Modeling
    Suvodip Dey, Yi-Jyun Sun, Gokhan Tur, Dilek Hakkani-Tür
    The Annual Meeting of the Association for Computational Linguistics (ACL). 2025.

  23. SMART: Self-Aware Agent for Tool Overuse Mitigation
    Cheng Qian*, Emre Can Acikgoz*, Hongru Wang, Xiusi Chen, Avirup Sil, Dilek Hakkani-Tür, Gokhan Tur, Heng Ji
    The Annual Meeting of the Association for Computational Linguistics (ACL, Findings). 2025.

  24. Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs
    Sagnik Mukherjee*, Abhinav Chinta*, Takyoung Kim, Tarun Anoop Sharma, Dilek Hakkani-Tür
    International Conference on Machine Learning (ICML). 2025.

  25. Better Slow than Sorry: Introducing Positive Friction for Reliable Dialogue Systems
    Mert İnan, Anthony Sicilia, Suvodip Dey, Vardhan Dongre, Tejas Srinivasan, Jesse Thomason, Gökhan Tür, Dilek Hakkani-Tür, Malihe Alikhani
    Transactions of the Association for Computational Linguistics (TACL). 2025.

  26. ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents
    Vardhan Dongre, Xiaocheng Yang, Emre Can Acikgoz, Suvodip Dey, Gokhan Tur, Dilek Hakkani-Tür
    International Workshop of Spoken Dialogue Systems (IWSDS). 2025.

  27. DELIFT: Data Efficient Language model Instruction Fine Tuning
    Ishika Agarwal, Krishnateja Killamsetty, Lucian Popa, Marina Danilevksy
    International Conference on Learning Representations (ICLR). 2025.

  28. Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation
    Takyoung Kim, Kyungjae Lee, Young Rok Jang, Ji Yong Cho, Gangwoo Kim, Minseok Cho, Moontae Lee
    Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL, Findings). 2025.

  29. Infogent: An Agent-based Framework for Web Information Aggregation
    Revanth Gangi Reddy*, Sagnik Mukherjee*, Jeonghwan Kim*, Zhenhailong Wang*, Dilek Hakkani-Tür, Heng Ji
    Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL, Findings). 2025.

  30. From Context to Action: Analysis of the Impact of State Representation and Context on Generalizability of Multi-Turn Web Navigation Agents
    Nalin Tiwary*, Vardhan Dongre*, Sanil Chawala, Ashwin Lamani, Dilek Hakkani-Tür
    Neural Information Processing Systems (NeurIPS) Workshop on Open-World Agents. 2024.

  31. Simulating User Agents for Embodied Conversational AI
    Daniel Phillipov, Vardhan Dongre, Gokhan Tur, Dilek Hakkani-Tür
    Neural Information Processing Systems (NeurIPS) Workshop on Open-World Agents. 2024.

  32. Instruct, Not Assist: LLM-based Multi-Turn Planning and Hierarchical Questioning for Socratic Code Debugging
    Priyanka Karagupta*, Ishika Agarwal*, Dilek Hakkani-Tür, Jiawei Han
    Empirical Methods in Natural Language Processing (EMNLP, Findings). 2024.

  33. Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting
    Sagnik Mukherjee*, Muhammad Farid Adilazuarda*, Sunayana Sitaram, Kalika Bali, Alham Fikri Aji, Monojit Choudhury
    Empirical Methods in Natural Language Processing (EMNLP). 2024.

  34. Towards Measuring and Modeling “Culture” in LLMs: A Survey
    Muhammad Farid Adilazuarda*, Sagnik Mukherjee*, Pradhyumna Lavania, Siddhant Singh, Alham Fikri Aji, Jacki O’Neill, Ashutosh Modi, Monojit Choudhury
    Empirical Methods in Natural Language Processing (EMNLP). 2024.

  35. Unsupervised Human Preference Learning
    Sumuk Shashidhar, Abhinav Chinta, Vaibhav Sahai, Dilek Hakkani-Tür
    Empirical Methods in Natural Language Processing (EMNLP). 2024.

  36. Large Language Models as User Agents for Evaluating Task-Oriented-Dialogue Systems
    Taaha Kazi, Ruiliang Lyu, Sizhe Zhou, Dilek Hakkani-Tür, Gokhan Tur
    IEEE Spoken Language Technology Workshop (IEEE SLT). 2024.

  37. Confidence Estimation for LLM-Based Dialogue State Tracking
    Yi-Jyun Sun, Suvodip Dey, Dilek Hakkani-Tür, Gokhan Tur
    IEEE Spoken Language Technology Workshop (IEEE SLT). 2024.

  38. Dialog Flow Induction for Constrainable LLM-Based Chatbots
    Stuti Agrawal, Nishi Uppuluri, Pranav Pillai, Revanth Gangi Reddy, Zoey Li, Gokhan Tur, Dilek Hakkani-Tür, Heng Ji
    Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). 2024.