Publications

2026

Can LLM Agents Infer World Models? Evidence from Agentic Automata Learning
Reef Menaged, Gili Lior, Shauli Ravfogel, Roee Aharoni, Gabriel Stanovsky
PDF DATA CODE WEBSITE
Extending Item Response Theory for Efficient and Meaningful Multilingual Evaluation
Gili Lior, Tzviel Frostig, Gabriel Stanovsky, Matan Eyal
PDF WEBSITE

2025

PromptSuite: A Task-Agnostic Framework for Multi-Prompt Generation
Eliya Habba*, Noam Dahan*, Gili Lior, Gabriel Stanovsky
EMNLP 2025 System Demonstrations
PDF DATA CODE WEBSITE
ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments
Gili Lior, Eliya Habba, Shahar Levy, Avi Caciularu, Gabriel Stanovsky
Findings of EMNLP 2025
PDF CODE
WildIFEval: Instruction Following in the Wild
Gili Lior, Asaf Yehudai, Ariel Gera, Liat Ein-Dor
GEM workshop @ ACL 2026
PDF DATA CODE
Comparing the Framing Effect in Humans and LLMs on Naturally Occurring Texts
Gili Lior, Liron Naccache, Gabriel Stanovsky
PDF DATA CODE

2024

Computation or Weight Adaptation? Rethinking the Role of Plasticity in Learning
Gili Lior*, Yuval Shalev*, Gabriel Stanovsky, Ariel Goldstein
CogSci 2026
PDF CODE
SEAM: A Stochastic Benchmark for Multi-Document Tasks
Gili Lior, Avi Caciularu, Arie Cattan, Shahar Levy, Ori Shapira, Gabriel Stanovsky
PDF CODE WEBSITE
Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction
Gili Lior, Yoav Goldberg, Gabriel Stanovsky
Findings of ACL 2024
PDF CODE

2023

Comparing Humans and Models on a Similar Scale: Towards Cognitive Gender Bias Evaluation in Coreference Resolution
Gili Lior and Gabriel Stanovsky
CogSci 2023
PDF CODE TALK