Publications2026 Can LLM Agents Infer World Models? Evidence from Agentic Automata LearningReef Menaged, Gili Lior, Shauli Ravfogel, Roee Aharoni, Gabriel Stanovsky PDF DATA CODE WEBSITE Extending Item Response Theory for Efficient and Meaningful Multilingual EvaluationGili Lior, Tzviel Frostig, Gabriel Stanovsky, Matan Eyal PDF WEBSITE2025 PromptSuite: A Task-Agnostic Framework for Multi-Prompt GenerationEliya Habba*, Noam Dahan*, Gili Lior, Gabriel StanovskyEMNLP 2025 System Demonstrations PDF DATA CODE WEBSITE ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of MomentsGili Lior, Eliya Habba, Shahar Levy, Avi Caciularu, Gabriel StanovskyFindings of EMNLP 2025 PDF CODE WildIFEval: Instruction Following in the WildGili Lior, Asaf Yehudai, Ariel Gera, Liat Ein-DorGEM workshop @ ACL 2026 PDF DATA CODE Comparing the Framing Effect in Humans and LLMs on Naturally Occurring TextsGili Lior, Liron Naccache, Gabriel Stanovsky PDF DATA CODE2024 Computation or Weight Adaptation? Rethinking the Role of Plasticity in LearningGili Lior*, Yuval Shalev*, Gabriel Stanovsky, Ariel GoldsteinCogSci 2026 PDF CODE SEAM: A Stochastic Benchmark for Multi-Document TasksGili Lior, Avi Caciularu, Arie Cattan, Shahar Levy, Ori Shapira, Gabriel Stanovsky PDF CODE WEBSITE Leveraging Collection-Wide Similarities for Unsupervised Document Structure ExtractionGili Lior, Yoav Goldberg, Gabriel StanovskyFindings of ACL 2024 PDF CODE2023 Comparing Humans and Models on a Similar Scale: Towards Cognitive Gender Bias Evaluation in Coreference ResolutionGili Lior and Gabriel StanovskyCogSci 2023 PDF CODE TALK