research

Researchers Recognized with the ICDT 2026 Test of Time Award

December 9, 2025
No items found.

The paper Declarative Probabilistic Programming with Datalog by Vince Bárány, Balder ten Cate, Benny Kimelfeld, Dan Olteanu, and Zografoula Vagena has been selected as the winner of the ICDT 2026 Test of Time Award

This award is presented annually to one or two papers from the ICDT proceedings of ten years prior. It recognizes research that has had a "significant impact in terms of research, methodology, conceptual contribution, or transfer to practice."

Originally published at ICDT 2016, this paper is being celebrated for establishing important theoretical groundwork connecting statistical modeling in programming languages (termed “probabilistic programming”) with relational database logic.

A Probabilistic Extension to Datalog

Datalog is a declarative programming language that is traditionally used as a database query language. The declarative nature of Datalog has been exploited (e.g., specifications can be more concise) in applications, including networking, data integration, information extraction, and cloud computing. 

In many applications, especially in ML and AI, systems need to make decisions under uncertainty through inference. One paradigm designed with this challenge in mind is Probabilistic Programming, where the programming language allows the user to define random processes, and then the system automatically executes statistical inference over the probability space of executions induced by these definitions. 

The motivation for this paper is to build a probabilistic extension of Datalog that is

  • expressive enough for complex statistical modeling,
  • yet strict enough to retain declarative guarantees, including independence of execution order and invariance under rewriting. 

Towards this goal,  Bárány, ten Cate, Kimelfeld, Olteanu, and Vagena introduced Generative Datalog, a probabilistic extension of Datalog, where one can sample from discrete probability distributions. This effectively defined a probability distribution over the set of possible worlds (database instances). A Generative Datalog program is then augmented with constraints to ensure that—at a high level—recursive Datalog rules interact with stochastic choices in a way that is consistent with our observations. 

Influence

The most enduring contribution of the paper is that it provided a rigorous mathematical framework for combining recursive logic from Datalog with probability distributions, thus bridging work in databases and work in PL and AI. A sample of the follow up works that further deepen this connection include the following developments: · 

· [1] established that Generative Datalog is well-defined even when variables are drawn from uncountably infinite domains (like real numbers), ensuring the language is mathematically sound for real-world statistical modeling.

· [2] introduces an extension of Datalog for probabilistic inference when inputs are correlated.

· [3] extends Generative Datalog to support negation, a difficult problem in both logic and probability.

The work was not purely academic; it was heavily inspired by and implemented within the LogicBlox system. The paper validated the architecture of LogicBlox’s analytical engine, which needed to perform forecasting (probabilistic) inside the database (deterministic). Moreover, this paper laid the theoretical foundations for “in-database machine learning”, which is a core component of RelationalAI.

This work was originally published in ICDT 2016, when many of the authors on this paper were affiliated with LogicBlox. Researchers Benny Kimelfeld (Technion - Israel Institute of Technology), Dan Olteanu (University of Zurich), and Zografoula Vagena (Data Intelligence Institute of Paris, Université Paris Cité) are members of our RelationalAI research network. Congratulations to the authors on this well-deserved recognition!