
Please email the program chair, Oege de Moor, to suggest changes.
All talks take place in the Auditorium. Throughout the workshop, coffee will be served in the Auditorium foyer, and demonstrations will run there. To stimulate discussion, the Summer Common Room is available at all times (except Wednesday and Thursday afternoon) as a place to meet, and the registration desk is situated there. Small impromptu break-out meetings can also be held in New Buildings V-2.
| Monday March 15 | |
| 14:00-17:00 | Tour of Oxford: participants must register for this separately. |
| (no organised dinner - participants will receive restaurant recommendations) | |
| Tuesday March 16 | |
| Session 1 (chair: Oege de Moor) | |
| 08:45-09:00 | |
| 09:00-10:00 | Molham Aref (LogicBlox) Datalog for Enterprise Applications: from Industrial Applications to Research (slides) |
| 10:00-10:30 | Nicola Leone The Disjunctive Datalog System DLV: from Research to Industrial Applications (slides) |
| break in the Auditorium foyer | |
| Session 2 (chair: Yannis Smaragdakis) | |
| 11:00-11:30 | Jason Eisner (Johns Hopkins University) Dyna: Extending Logic Programming to Support Modern Statistical AI |
| 11:30-12:00 | Veronica Dahl (Simon Fraser University) Datalog plus Language Intelligence (L.I.): the missing link for true A.I. |
| 12:00-12:30 | Shan Shan Huang (LogicBlox) Generics and Datalog |
| 12:30-13:00 | Francois Bry (University of Munich) Datalog Relaunched: Simulation, Unification and RDFLog (abstract, slides) Datalog is convenient for reasoning on the Web, because it is simple and intuitive and because it has solid foundations. Datalog is however lacking data extraction from semi-structured data (like HTML and XML) and value invention (needed especially for RDF). This presentation first motivates and describes "simulation unification", a form of unification tuned to semi-structured data. Like standard unification, simulation unification determines bindings for variables in both terms to unify. Unlike standard unification, simulation unification does not make the two terms identical but instead search for an embedding of the query into the data. Simulation unification is decidable, sound and complete, and has polynomial data complexity. Without variable and some incompleteness query constructs, it has linear time complexity; with variables it is, like standard unification, NP hard. Simulation unification is closely related to the fragment of XPath with only forward axes, a restriction that does not affect the expressiveness of XPath. RDFLog is Datalog tuned to RDF with arbitrary quantifier alternation: Blank nodes (that is, existentially quantified variables) may occur in the scope of all, some, or none of the universal variables of an RDFLog rule. RDFLog declarative semantics is defined in terms of RDF entailment (due to RDF technicalities making a model theory problematical). RDFLog's operational semantics consists in the following three steps: (1) Skolemization, (2) standard Datalog evaluation, and (3) un-Skolemization. RDFLog's operational semantics is sound and complete. Interestingly, RDFLog (or Datalog) with ∀*∃* prefixes is, up to the projection of auxilliary predicates, equivalent to RDFLog (or Datalog) with full quantifier alternation. A light-weight, ad hoc implementation of RDFLog outperforms both ARQ SPARQL and Sesame SPARQL (with main memory store) in accessing 14.000 triples with both, ∀∀ and ∀∃∀ sirups. |
| Session 3 (chair: Georg Gottlob) | |
| lunch provided in Magdalen College Hall | |
| 14:00-14:30 | Piero Bonatti (University of Naples) Datalog for Security, Privacy and Trust (slides) |
| 14:30-15:30 | Vladimir Lifschitz (University of Texas at Austin) Semantics of Datalog: circumscription, stable models, aggregates (abstract, slides) The meaning of a positive Datalog program can be characterized by circumscribing its intensional predicates. In this talk we will show that by modifying the definition of circumscription we can extend this idea to several constructs widely used in answer set programming: negation as failure, choice, and aggregates. |
| break in the Auditorium foyer | |
| Session 4 (chair: Nicola Leone) | |
| 16:00-16:30 | Stefan Woltran (Technical University Vienna) Different Equivalence Notions for Disjunctive Datalog (abstract, slides) In this talk we survey results about the decidability/undecidability frontier for checking equivalence between programs formulated in extended datalog (i.e., datalog augmented by default negation and/or disjunction). In particular, we shall focus on the notions of strong and uniform equivalence. For decidable problems we give precise complexity bounds and we also consider a restricted setting where the arity of predicates is bound by a fixed constant. |
| 16:30-17:00 | Max Schaefer (Semmle Ltd.) Type Inference for Datalog with Complex Type Hierarchies (abstract, slides) Type inference for Datalog can be understood as the problem of mapping programs to a sublanguage for which containment is decidable. To wit, given a program in Datalog, a schema describing the types of extensional relations, and a user-supplied set of facts about the basic types (stating conditions such as disjointness, implication or equivalence), we aim to infer an over-approximation of the semantics of the program, which should be expressible in a suitable sublanguage of Datalog. We argue that Datalog with monadic extensionals is an appropriate choice for that sublanguage of types, and we present an inference algorithm. The inference algorithm is proved sound, and we also show that it infers the tightest possible over-approximation for a large class of Datalog programs. Furthermore, we present a practical containment check for a large subset of our type language. The crux of that containment check is a novel generalisation of Quine's procedure for computing prime implicants. The type system has been implemented in a state-of-the-art industrial database system, and we report on experiments with this implementation. |
| 17:00-17:30 | Reinhard Pichler (Technical University Vienna) Exploiting bounded treewidth with Datalog (abstract, slides) Many intractable problems have been shown to become tractable if the treewidth of the underlying structure is bounded by a constant. An important tool for deriving such results is Courcelle's Theorem, which states that all properties defined by Monadic-Second Order (MSO) sentences are fixed-parameter tractable with respect to the treewidth. In principle, algorithms can be generated automatically from the MSO definition of a problem by exploiting the correspondence between MSO and finite tree automata (FTA). However, this approach has turned out to be problematical, since even relatively simple MSO formulae may lead to a "state explosion" of the FTA. In this talk, we present monadic datalog (i.e., datalog where all intensional predicate symbols are unary) as an alternative method to tackle this class of fixed-parameter tractable problems. First, if some property of finite structures is expressible in MSO then this property can also be expressed by means of a monadic datalog program. Moreover, the resulting fragment of datalog can be evaluated in linear time (both with respect to the program size and with respect to the data size). We also report on experimental results which underline the potential of this approach. |
| 17:30-18:00 | Michael Benedikt (University of Oxford) Optimizing Datalog 0.1 |
| (no organised dinner - participants will receive restaurant recommendations) | |
| 20:00-late | Discussion: Towards a Datalog standard (Summer Common Room) |
| Wednesday March 17 | |
| Session 5 (chair: Francois Bry) | |
| 09:00-10:00 | Serge Abiteboul (INRIA) Distributed data management on the web (slides) |
| 10:00-10:30 | Marcelo Arenas (Catholic University Chile) Datalog as a query language for data exchange systems (abstract, slides) Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema. An important issue in data exchange is that the existing mapping specification languages usually do not completely determine the relationship between source and target data and, thus, there may be many possible translations for a given source instance. In particular, given that queries in this context are posed over the target schema, there is a general agreement in the literature that given a target query Q and a source instance I, the semantics of Q should be defined in terms of certain answers, that is, as the intersection of the answer to Q over every possible target translation of I. The definition of certain answers is highly non-effective, as it involves computing the intersection of (potentially) infinitely many sets. Thus, it becomes particularly important to understand for which classes of relevant queries the certain answers can be computed efficiently. Fagin, Kolaitis, Miller and Popa have investigated this problem, and have shown that the language of unions of conjunctive queries (UCQ) can be evaluated efficiently in the data exchange context. In fact, it can be shown that Datalog can also be evaluated efficiently in this context. Unfortunately, both UCQ and Datalog keeps us in the realm of the positive, while most database query languages are equipped with negation. In this talk, we introduce an extension of Datalog that is equipped with a restricted form of negation, and show that the certain answers to any of these programs can be computed efficiently. Interestingly, this language can be used to improve our knowledge about the classes of queries with negation that can be evaluated efficiently in the data exchange context. In particular, we show in this talk that the proposed language is expressive enough to represent every union of conjunctive queries with at most one inequality or negated relational atom per disjunct. Furthermore, we show that the previous result holds for a syntactic restriction of the class of unions of conjunctive queries with at most two inequalities per disjunct. It is important to notice that this restriction is given by two conditions that are optimal, in the sense that computing certain answers becomes intractable if any of them is removed. |
| break in the Auditorium foyer | |
| Session 6 (chair: Monica Lam) | |
| 11:00-11:30 | Robert Baumgartner (Lixto) Datalog-related aspects in the web data extraction tool Lixto (abstract, slides) Lixto Visual Developer is an integrated development environment specifically geared towards the visual development of Web data extraction programs, supporting complex navigation and extraction tasks on highly dynamic Web applications. Internally, created extraction rules are reflected in a declarative extraction language called Elog, which relies on a datalog syntax and semantics. It is ideally suited for representing and successively incrementing the knowledge about patterns described by application designers. In this talk we illustrate aspects of the Visual Developer and the Elog language including a small example in a live demonstration. |
| 11:30-12:00 | Axel Polleres (National University of Ireland) Using Datalog for Rule-Based Reasoning over Web Data: Challenges and Next Steps (abstract, slides) With the rise of the Semantic Web and initiatives such as Linked Open Data, a huge amount of structured data, mostly using lightweight ontologies such as FOAF and SIOC is becoming available on the Web. In this talk, we will report on our experiences and findings on performing rule-based fwd-chaining reasoning on this data which is in the orders of billions of RDF statements crawled from the Web. Particularly, we will focus on the fragment of OWL that can be reasonably dealt with in such inference, and discuss which ommissions we make in our deliberately incomplete reasoning approach in order to be able to cope with scale and noise in Web data. Finally, we will propose next steps on practical approaches of dealing with inconsistencies, and incorporating ranking in our inference procedure. |
| 12:00-12:30 | Letizia Tanca (Polytecnico di Milano) Context Modelling and Context-aware Querying: can Datalog be of help? (abstract, slides) Many interpretations of the notion of context have emerged in various fields of research like psychology, philosophy, or computer science. Context-aware systems are pervading everyday life, therefore context modelling is becoming a relevant issue and an expanding research field. Context has often a significant impact on the way humans (or machines) act, and on how they interpret things; furthermore, a change in context causes a transformation in the experience that is going to be lived. The word itself, derived from the Latin con (with or together) and texere (to weave), describes a context not just as a profile, but as an active process dealing with the way humans weave their experience within their whole environment, to give it meaning. Accordingly, while the computer science community has initially perceived the context simply as a matter of user time and location, in the last few years this notion has been considered not simply as a state, but as part of a process in which users are involved; thus, sophisticated and general context models and systems have been proposed, to support context-aware applications which use them to: (a) adapt interfaces; (b) determine the set of application-relevant data and/or services, (c) support autonomous behaviours of the system; (d) make the user interaction implicit, or (e) build smart environments. In Information Management, context-aware systems are mainly devoted to determining which information is relevant with respect to the ambient conditions; however, nowadays the available data sources may have different natures, varying from simple relational data to semantically annotated knowledge, to data coming from sensors. Accordingly, sources need to be interpreted, integrated and filtered (tailored) in order to: 1) provide the user or the application with the appropriately tailored set of data or services, thus eliminating information noise, 2) match devices' physical constraints, in particular in mobile applications, 3) appropriately tailor sensor queries, etc. In this talk I illustrate a foundational framework for the lifecycle of context-aware system, in which the system design and management activities consider context as an orthogonal, first-class citizen. In doing so, I also discuss how a Datalog-based formulation may contribute to the definition of context-aware database views. |
| 12:30-13:00 | Boris Motik (University of Oxford) Using Datalog on the Semantic Web (slides) |
| lunch provided in Magdalen College Hall | |
| Session 7 (chair: Oege de Moor) | |
| 14:00-15:00 | Georg Gottlob (University of Oxford and Lixto) Datalog+/- |
| 15:00-15:30 | Andrzej Szalas (Linkoping University) Datalog4: Living with Inconsistency and Taming Nonmonotonicity (abstract, slides) The problem of inconsistent and unknown information is crucial in many applications. It has been addressed in extensions of query languages in deductive databases, based on non-monotonic logics initially derived from the Closed World Assumption (CWA). In many applications, including Semantic Web technologies and robotics systems, CWA is not necessarily applicable and developments in these fields usually follow the Open World Assumption (OWA). In this talk we present a novel, intuitive approach to the problem. We propose a Datalog-like language with unrestricted negation based on a four-valued logic with 'true', 'false', 'inconsistent' and 'unknown' as truth values. We start with a lightweight and monotonic layer where OWA is accepted, giving rise to a four-valued semantics of queries. To reduce the unknown/inconsistent zones we next introduce simple constructs allowing one to express various mechanisms of nonmonotonic reasoning. In particular, we provide means for application-specific disambiguation of inconsistent information, the use of Local CWA (thus also CWA, if needed), as well as various forms of default reasoning. The resulting query language is still tractable (w.r.t. data complexity). |
| break in the Auditorium | |
| Session 8 (chair: Vladimir Lifschitz) | |
| 16:00-16:30 | Thomas Eiter (Technical University Vienna) Datalog extensions (slides) |
| 16:30-17:00 | Mirek Truszczynski (University of Kentucky) Inductive definitions in knowledge representation and databases (abstract) The logic FO(ID) is a knowledge representation language obtained by extending the first-order logic with a construct for representing inductive definitions. A definition construct in FO((D) is a set of rules with atoms of the defined relation in the head. Its informal semantics is solidly grounded in the convention in mathematics to represent an inductive definition through a set of conditionals. FO(ID)'s definition construct is thus quasi identical to logic programs and DATALOG. In fact, FO(ID) was argued to be a methodologically sound counterpart to knowledge representation systems based on logic programming with the stable-model semantics. The realization of the key role of inductive definitions in knowledge representation is a relatively recent development. On the other hand, the need for means to model inductive definitions has been long recognized by the database community. DATALOG and fixpoint logic are two well-known database query languages whose central characteristics is the ability to express directly concepts whose definitions are inductive. We aim to discuss the theme of inductive definitions in the logic FO(ID), DATALOG and fixpoint logic, consider its role in each formalism and elaborate on the interconnections between them. |
| 17:00-17:30 | Gosta Grahne (Concordia University) The revenge of the conditional tables |
| 17:30-18:00 | John Field (IBM) The Reactor Model: Declarative Distributed Programming (abstract, slides) In this talk, I will describe the Reactor programming model, which is designed to serve serve as a foundation for building distributed applications in a declarative manner. A reactor consists of two principal components: mutable state, in the form of a fixed collection of relations, and code, in the form of a fixed collection of rules in the style of Datalog. A reactor's code is executed in response to an external stimulus, which takes the form of an attempted update to the reactor's state. As in classical process calculi, the reactor model accommodates collections of distributed, concurrently executing processes. However, unlike classical process calculi, our observable behaviors are sequences of states, rather than sequences of messages. Similarly, the interface to a reactor is simply its state, rather than a collection of message channels, ports, or methods. One novel feature of our model is the ability to compose behaviors both synchronously and asynchronously. Also, our use of Datalog-style rules allows aspect-like composition of separately-specified functional concerns in a natural way. This work is joint with M.-C. Marinescu and C. Stefansen. |
| (no organised dinner - participants will receive restaurant recommendations) | |
| 20:00-late | Open programme (Summer Common Room) |
| Thursday March 18 | |
| Session 9 (chair: Serge Abiteboul) | |
| 09:00-10:00 | Monica Lam (Stanford University) Datalog for Decentralized Social Networking (slides) |
| 10:00-10:30 | Danilo Montesi (University of Bologna) Datalog for the Web 2.0: the case of Social Network Data Management (slides) |
| break in the Auditorium foyer | |
| Session 10 (chair: Manuel Hermenegildo) | |
| 11:00-11:30 | Foto Afrati (National Technical University of Athens) Parallel evaluation of Datalog (abstract, slides) Many new applications require computations on data of huge size (e.g. in web operations, social networks, collaborative filtering). To deal with computations of this size, companies use large collections of commodity servers connected in a cluster under share-nothing architecture. Datalog is a natural language to use to express queries that extract useful information from such data, e.g., the "friends-of-friends" relation in social networks. Recently, Map-Reduce (and its equivalent open source software "Hadoop") is a widely used framework for data-intensive computations on clusters of computers. Map-reduce offers a simple programming environment that hides parallelization, fault-tolerance and other details from the user. We will show how to compute efficiently non-recursive Datalog queries in Map-Reduce (EDBT2010 paper by Afrati and Ullman). |
| 11:30-12:00 | Jeff Ullman (Stanford University) Map-reduce implementation of Datalog (abstract, slides) When you try to implement a recursive Datalog program using Map-Reduce, several interesting issues arise. First, is the matter of how one performs a step of seminaive evaluation given a fixed number of compute nodes. Second, there are certain recursions, such as left-linear ones, where a surprisingly efficient implementation exists. Third, for a general recursion, the fundamental assumption that justifies map-reduce is violated. In particular, to deal with the potential for node failure, map-reduce implementations assume that tasks get a single input file and produce a single output file. But mutually recursive tasks cannot obey this assumption. Therefore, we need to rearchitect systems like Hadoop if we are to implement large-scale recursions and yet allow programs to complete in the face of node failures. |
| 12:00-12:30 | Neil Conway (University of California, Berkeley) Cloud Programming: From Doom and Gloom to BOOM and Bloom (slides) |
| 12:30-13:00 | David Maier (Portland State University) Dynamic Datalog (abstract, slides) There has been recent interest in using Datalog to specify queries over rapidly evolving state, such as for declarative networking and data stream processing. In order not do introduce undue latency or scheduling overhead, evaluation approaches have avoided staging or quiescing computation after updates, instead using query plans that are free-running cycles. However, the output guarantees in such systems are fairly weak, usually corresponding to some notion of "eventual consistency", which can make it difficult to determine what derived results hold at any particular point in time. I will report on work on tracking progress of cyclic queries over streaming inputs, using a new operator called "Flying Fixed-Point" (FFP). FFP speculates about result progress, then verifies its speculations. In conjunction with interval-based modeling of validity, it becomes possible to determine exactly which derived results hold at each point in time. In addition, it requires minimal modification of other query operators, even if they support advanced features such as retractions or out-of-order processing. I will also discuss the class of "strongly convergent" queries, which are currently a requirement for FFP. |
| lunch provided in Examination Schools, room 7 | |
| Afternoon session in Examination Schools, rooms 6 and 7 | |
| Session 11 (chair: Molham Aref) | |
| 14:00-14:30 | Neville Roberts (Enterprise CIO, Best Buy) Challenges of Enterprise IT |
| 14:30-15:15 | LogicBlox product presentation |
| break in Examination Schools, room 7 | |
| Session 12 (chair: David Warren) | |
| 15:45-16:30 | Semmle product presentation |
| 16:30-17:15 | Lixto product presentation |
| 17:15-18:00 | Exeura product presentation |
| 18:30-19:00 | reception (New Room at Magdalen College) |
| 19:00-late | Banquet in Magdalen College Hall |
| Friday March 19 | |
| Session 13 (chair: Mirek Truszczynski) | |
| 09:00-10:00 | Moshe Vardi (Rice University) Containment of Datalog Programs |
| 10:00-10:30 | Christophe Joubert (Technical University of Valencia) Using Boolean Equations and Rewriting Systems to solve Datalog-based Program Analyses (abstract, slides) We summarize two novel, fully automated Datalog resolution methods that we apply to object-oriented program analysis. The first method transforms the Datalog program in an implicit Boolean Equation System (BES) that is then solved by using existing general purpose verification toolboxes, such as CADP that provides local BES resolutions with linear-time complexity. The second method transforms the Datalog program into a Rewriting Logic theory that is then run by exploiting the main features of the high-level programming language Maude. Datalog is used as a specification language for expressing both complex interprocedural program analyses involving dynamically created objects and queries. We evaluate our results experimentally by applying them to real-world Datalog-based analyses. |
| break in the Auditorium foyer | |
| Session 14 (chair: Gosta Grahne) | |
| 11:00-11:30 | Ilkka Niemela (Aalto University) Integrating Answer Set Programming and Satisfiability modulo Theories (abstract, slides) In this talk we consider the problem of integrating answer set programming (ASP) and satisfiability modulo theories (SMT). We discuss a characterization of stable models of logic programs based on Clark's completion and simple difference constraints. The characterization leads to a method of translating a ground logic program to a linear size theory in difference logic, i.e. propositional logic extended with difference constraints between two integer variables, such that stable models of the program correspond to satisfying assignments of the resulting theory in difference logic. Many of the state-of-the-art SMT solvers support directly difference logic. This opens up interesting possibilities. On one hand, any solver supporting difference logic can be used immediately without modifications as an ASP solver for computing stable models of a logic program by translating the program to a theory in difference logic. On the other hand, SMT solvers typically support also other extensions of propositional logic such as linear real and integer arithmetic, fixed-size bit vectors, arrays, and uninterpreted functions. This suggests interesting opportunities to extend ASP languages with such constraints and to provide effective solver support for the extensions. Using the translation an extended language including logic program rules and, for example, linear real arithmetic can be translated to an extension of propositional logic supported by current SMT solvers. We discuss the effectiveness of state-of-the-art SMT solvers as ASP solvers and the possibilities of developing extended ASP languages based on SMT solver technology. |
| 11:30-12:00 | Marina de Vos (University of Bath) Applications of Answer Set Programming (abstract, slides) In the twenty years since the creation of the stable model semantics for logic programs and the inclusion of both classical and negation as failure, answer set programming (ASP) has grown from a small fledgeling field within logic programming to a maturing field of its own. In this presentation I will highlight two of the applications that we have been working on over the last five years: Toast and Anton. The first uses ASP to super-optimise machine while the second is an automated music composition system based on ASP. Using these applications, the benefits of using ASP will be discussed together with the various practical and theoretical challenges that need addressing for ASP to be used outside academia. |
| 12:00-12:30 | Stefania Costantini (University of Aquila) Answer set programming for agents and multi-agent systems |
| 12:30-13:00 | Michael Gelfond (Texas Tech University) Answer Set Programming and Probabilistic Reasoning (abstract, slides) I give an introduction to P-log - a language for representing logical and probabilistic information in the form of a probabilistic logic program. The semantics of the language associates the program with a set of possible worlds and a measure on them, which can be used to answer queries about the probability of a formula $F$ with respect to the knowledge represented in the program. The logical foundation of our semantics is Answer Set Prolog. Causal Bayesian networks provide the probabilistic foundation. The language and its use for logical and probabilistic reasoning will be illustrated by several examples. |
| lunch provided in the New Room | |
| Session 15 (chair: Jeff Ullman) | |
| 14:00-14:30 | David S. Warren (SUNY at Stony Brook) Subsumptive tabling for Datalog programs in XSB (abstract) In the usual tabled evaluation strategy, a goal is recognized as repeated if it is a variant of a previous goal, i.e., identical up to variable renaming. An alternative is for a goal to be repeated if it is subsumed by a previous goal, i.e., is an instance of it. In the XSB tabled Prolog system, one can choose which definition is to be used. In this talk, I will briefly describe the implementation of subsumptive tabling, discuss examples where subsumptive tabling is much more efficient than variant tabling, and explore interesting uses of subsumptive tabling in XSB. |
| 14:30-15:00 | Damien Sereni (Semmle Ltd) Industrial-strength optimisation of Datalog |
| 15:00-15:30 | Martin Bravenboer (LogicBlox) The LogicBlox execution engine for Datalog |
| break in the Auditorium foyer | |
| Session 16 (chair: Torbjorn Ekman) | |
| 16:00-16:30 | Manuel Hermenegildo (Technical University Madrid and IMDEA) An Overview of Ciao and its uses of DataLog for Program Analysis and Optimization (abstract, slides) Ciao is a multiparadigm programming system which, in addition to supporting logic programming (and, in particular, ISO-Prolog), provides the programmer with a selection of useful features from different programming paradigms, such as functions, higher order, ASP modules, constraints, concurrency, objects, or tabling. It also includes a rich assertion language used for verification, testing, and documentation. Such features (including, e.g., the Prolog primitives) can be turned on and off for each program module. An important aspect of Ciao is its preprocessor which is capable of statically finding non-trivial bugs, verifying that programs comply with specifications, and performing many types of program optimizations (including automatic parallelization). We present an overall introduction to Ciao and its preprocessor and illustrate how some datalog flavours are used in different kinds of program analyses and optimizations performed by it. |
| 16:30-17:00 | Yannis Smaragdakis (University of Massachusetts at Amherst) Using Datalog for Fast and Easy Program Analysis |
| 17:00-17:30 | Val Tannen (University of Pennsylvania) Datalog on semiring-annotated relations |