Direct Access for Conjunctive Queries with Negation

Oliver Irwin

01/12/2023 - LINKS Seminar

Direct Access

Context

Join Query : \(Q(x_1, \dots, x_n) = \bigwedge_{i=1}^k R_i(\vec{z_i})\)

where \(\vec{z_i}\) is a tuple over \(X = \{x_1,\dots,x_n\}\)

Example: \(Q(city, country, name, id) = People(id, name, city) \wedge Capitals(city, country)\)

People
id name city
1 Alice Paris
2 Bob Lens
3 Chiara Rome
4 Djibril Berlin
5 Émile Dortmund
6 Francesca Rome
Capitals
city country
Berlin Germany
Paris France
Rome Italy
\(Q(\mathbb{D})\)
city country name id
Paris France Alice 1
Rome Italy Chiara 3
Berlin Germany Djibril 4
Rome Italy Francesca 6

Direct Access

We want to access the \(k\)-th element of \(Q(\mathbb{D})\) for a given order.

Make \(Q(\mathbb{D})\) an array, sort it and then we have direct access?

Direct Access

We want to access the \(k\)-th element of \(Q(\mathbb{D})\) for a given order.

Make \(Q(\mathbb{D})\) an array, sort it and then we have direct access?

\(Q(\mathbb{D})\)
city country name id
Berlin Germany Djibril 4
Paris France Alice 1
Rome Italy Chiara 3
Rome Italy Francesca 6

\(Q(\mathbb{D})[4] = (Rome, Italy, Francesca, 6)\)

Direct Access

We want to access the \(k\)-th element of \(Q(\mathbb{D})\) for a given order.

Make \(Q(\mathbb{D})\) an array, sort it and then we have direct access?

\(Q(\mathbb{D})\)
city country name id
\(\dots\) \(\dots\) \(\dots\) \(\dots\)
Berlin Germany Djibril 4
\(\dots\) \(\dots\) \(\dots\) \(\dots\)
Paris France Alice 1
\(\dots\) \(\dots\) \(\dots\) \(\dots\)
Rome Italy Chiara 3
Rome Italy Francesca 6
\(\dots\) \(\dots\) \(\dots\) \(\dots\)

\(Q(\mathbb{D})[1432] =\) ??

Precomputation : very costly

Access : nearly free

We need another way to represent the data

A bit of context

Acyclic Queries

Central class of queries because of their tractability

\(Q = R_1(x,y,z) \wedge R_2(x,z,u) \wedge R_3(x,y,t) \wedge R_4(y,t) \wedge R_5(y,v)\)

Acyclicity and elimination order

An \(\alpha\)-leaf in a query \(Q\) is a variable \(x\) such that the neighbourhood \(N(x)\) of \(x\) is covered by an atom.

1 is an \(\alpha\)-leaf

2 is an \(\alpha\)-leaf

3 is an \(\alpha\)-leaf

4 is an \(\alpha\)-leaf

Acyclicity in Queries

A query \(Q\) is \(\alpha\)-acyclic iff one can obtain \(\emptyset\) by successively removing \(\alpha\)-leaves in \(Q\). This induces an order on \(V\) called an \(\alpha\)-elimination order.

[Brault-Baron, 2014], also known as “without disruptive trio” [Carmeli, Tziavelis, Gatterbauer, Kimelfeld, Riedewald, 2020]

In the previous example, 1, 2, 3, 4 is an \(\alpha\)-elimination order

Direct Access on acyclic queries

Given a join query \(Q(x_1,\dots,x_n)\), if \(x_n, \dots, x_1\) is an \(\alpha\)-elimination order then we can answer direct access queries with precomputation time \(\mathcal{O}(|\mathbb{D}|\mathsf{poly}(|Q|))\) and access time \(\mathcal{O}(\mathsf{poly}(|Q|)\mathsf{polylog}(|\mathbb{D}|))\).

[Carmeli, Tziavelis, Gatterbauer, Kimelfeld, Riedewald, 2020]

Previous Approach

Algorithm schema

  1. construct a join tree compatible with an \(\alpha\)-elimination order ;
  2. load data into the join tree & annotate the tuples ;
  3. top-down induction to answer \(Q(\mathbb{D})[k]\).

Construct a join tree

\(Q = R_1(x,y,z) \wedge R_2(x,z,u) \wedge R_3(x,y,t) \wedge R_4(y,t) \wedge R_5(y,v)\)

The order used here is \((x, y, z, t, u, v)\).

Load data and annotate the tuples

Load data inside the bags

Annotate by computing the number of extensions

Answer DA tasks

\(Q(\mathbb{D})[3]\) must set \(x\gets 2, y \gets 1, z \gets 0\), then proceed down in the tree.

Recap

We want to access the \(k\)-th solution to a query for a given database

Make a table, sort it, and done?

\(Q(\mathbb{D})\)
city country name id
Berlin Germany Djibril 4
Paris France Alice 1
Rome Italy Chiara 3
Rome Italy Francesca 6

We need another solution 😢

Join Tree Approach


use a join tree to answer tasks in an efficient way


Works for \(\alpha\)-acyclic queries 🥳

Signed Queries

Definition

Signed Query: \(Q(x_1, \dots, x_n) = \bigwedge_{i=1}^k R_i(\vec{z_i})\) \(\bigwedge_{j=1}^l \lnot S_i(\vec{z_j})\)

Big difference:

positively encoding \(\lnot S(\vec{z})\) on a domain \(D\) requires \((D^{|\vec{z}|} - \#R)\) tuples

\(Q\) being tractable for every \(\mathbb{D}\) does not imply that any \(Q' \subseteq Q\) is tractable.

is acyclic

is not acyclic

Example

Let \(Q\) be any query and consider \(Q' = Q(x_1,\dots,x_n) \wedge \lnot R(x_1,\dots,x_n)\).

For any database \(\mathbb{D}\) such that \(R^\mathbb{D}= \emptyset\), we have \(Q(\mathbb{D}) = Q'(\mathbb{D})\)

\(\implies\) if \(Q'\) is tractable, \(Q\) is tractable

But \(Q'\) is \(\alpha\)-acyclic and \(Q\) is not restricted

\(\alpha\)-acyclic does not suffice as it is not monotone

\(\beta\)-acyclicity

Good candidate for another measure of tractability: every \(Q' \subseteq Q\) is \(\alpha\)-acyclic.

This is known as \(\beta\)-acyclicity

Intuition: a \(\beta\)-elimination order is an order that is an \(\alpha\)-elimination order for every subquery

\(\beta\)-acyclicity

A \(\beta\)-leaf is a variable \(x\) such that all the atoms that include \(x\) are contained in one another.

Characterisation: A query \(Q\) is \(\beta\)-acyclic iff one can obtain \(\emptyset\) by successively removing \(\beta\)-leaves in \(Q\). This induces an order on \(V\) called an \(\beta\)-elimination order.

Recap

We want to access the \(k\)-th solution to a query for a given database

Make a table, sort it, and done?

\(Q(\mathbb{D})\)
city country name id
Berlin Germany Djibril 4
Paris France Alice 1
Rome Italy Chiara 3
Rome Italy Francesca 6

We need another solution 😢

Join Tree Approach


Works for \(\alpha\)-acyclic queries 🥳


No notion of join tree for \(\beta\)-acyclic signed queries 😢

We propose a new approach!


Recovers former results 🥳


Handles signed queries 😍

A Circuit Approach to Direct Access

Relational Circuits

\(x_1\) \(x_2\) \(x_3\)
0 0 0
0 0 1
0 1 0
0 1 1
1 0 1
1 0 2
1 1 1
1 1 2
1 2 0
1 2 1
2 0 1
2 0 2
2 2 1
2 2 2

Relational Circuits

factorised representation of relations

circuit with 3 kinds of gates :

  • inputs : \(\top\) & \(\bot\)
  • decision gates
  • \(\times\)-gates

paths from decision gates are labelled by the domain values

Ordered Relational Circuits

factorised representation of relations

circuit with 3 kinds of gates :

  • inputs : \(\top\) & \(\bot\)
  • decision gates
  • \(\times\)-gates

paths from decision gates are labelled by the domain values

+ order \(\prec\) on the variables

Ordered Relational Circuits

For \(C\) an ordered relational circuit, we can perform direct access tasks in time \(\mathcal{O}(\mathsf{poly}(|X|)\mathsf{polylog}(|D|)\) after a preprocessing in time \(\mathcal{O}(|C|\cdot\mathsf{poly}(|X|)\mathsf{polylog}(|D|))\)

Preprocessing

Idea : for each gate \(v\) over \(x_i\) and for each domain value \(d\)

compute the size of the relation where \(x_i\) is set to a value \(d'\leqslant d\)

Preprocessing

Direct Access

Compute the 7th solution \(\to\) 111

Direct Access

Compute the 13th solution \(\to\) 221

From CQ to circuit

\(Q\) a CQ and \(x_1\prec\dots\prec x_n\) an order over the variable set

\(Q(\mathbb{D}) = \biguplus_{d\in D} Q[x_1 = d](\mathbb{D})\)

\[ \text{if} \begin{cases} Q & = & Q_1 \land Q_2 \\ \mathsf{var}(Q_1) \cap \mathsf{var}(Q_2) & = & \emptyset \end{cases} \]

then \(Q(\mathbb{D}) = Q_1(\mathbb{D}) \times Q_2(\mathbb{D})\)

From CQ to circuit

\(Q\) a CQ and \(x_1\prec\dots\prec x_n\) an order over the variable set

\(Q(\mathbb{D}) = \biguplus_{d\in D} Q[x_1 = d](\mathbb{D})\)

\[ \text{if} \begin{cases} Q & = & Q_1 \land Q_2 \\ \mathsf{var}(Q_1) \cap \mathsf{var}(Q_2) & = & \emptyset \end{cases} \]

then \(Q(\mathbb{D}) = Q_1(\mathbb{D}) \times Q_2(\mathbb{D})\)

recursive implementation + cache \(\implies\) ordered relational circuit computing \(Q(\mathbb{D})\)

Compiling Signed Queries

Let \(Q\) be an SJQ and \(x_n,\dots,x_1\) a \(\beta\)-elimination order for \(Q\). Exhaustive DPLL on \(Q\), \(\mathbb{D}\) and with order \(x_1,\dots,x_n\) returns an ordered circuit of size \(\mathcal{O}(\mathsf{poly}(|Q|)\mathsf{poly}(|\mathbb{D}|))\).

(Generalisation of [Capelli, 2017])

Corollary:

Direct Access for \(\beta\)-acyclic SJQ with \(\mathcal{O}(\mathsf{poly}(|\mathbb{D}|))\) preprocessing and access time \(\mathcal{O}(\mathsf{polylog}(|\mathbb{D}|)\mathsf{poly}(|Q|))\) for lexicographical orders based on (reversed) \(\beta\)-elimination orders.

Side Note:

Join-Tree based approaches fail for \(\beta\)-acyclic SJQs

Recap

For a query \(Q(x_1,\dots,x_n)\) and an order on the variables of “complexity” \(k\), we can solve DA tasks with a preprocessing in time \(\mathcal{O}(|\mathbb{D}|^k\mathsf{poly}(|Q|))\) and access in time \(\mathcal{O}(\mathsf{poly}(|Q|)\mathsf{polylog}(|\mathbb{D}|))\).

Algorithm schema:

  1. construct a join tree compatible with an \(\alpha\)-elimination order ;
  2. load data into the join tree & annotate the tuples ;
  3. top-down induction to answer \(Q(\mathbb{D})[k]\).

Recap

For a query \(Q(x_1,\dots,x_n)\) and an order on the variables of “complexity” \(k\), we can solve DA tasks with a preprocessing in time \(\mathcal{O}(|\mathbb{D}|^k\mathsf{poly}(|Q|))\) and access in time \(\mathcal{O}(\mathsf{poly}(|Q|)\mathsf{polylog}(|\mathbb{D}|))\).

Algorithm schema:

  1. construct a join tree compatible with an \(\alpha\)-elimination order ;
  2. load data into the join tree & annotate the tuples ;
  3. top-down induction to answer \(Q(\mathbb{D})[k]\).
  1. compile an ordered relational circuit \(C\) computing \(Q(\mathbb{D})\) ;
  2. annotate the gates with the number of solutions ;
  3. top-down induction to answer \(Q(\mathbb{D})[k]\).

Going Further

Generalisation

This technique generalises to:

  1. conjunctive (with \(\exists\) quantifiers) signed queries:
    • project \(\exists\) directly on the circuit
    • as long as the projection is on a suffix
  2. Non-acyclic signed conjunctive queries:
    • we can associate a notion of width to elimination orders
    • positive case \(\to\) fractional hypertree width
    • corresponds to the width of the worst subquery in the negative case

Next steps

Going further with circuits

study the tractability of the circuit approach for DA on CQs with aggregation

\(Q(p, c, g, \mathsf{count()}) = \mathsf{Teams}(p, c) \land \mathsf{Games}(g, c, \cdot) \land \mathsf{Tries}(g, p)\)

How should we integrate the aggregation in the lexicographical order?

How does the aggregation fit in to the compiled circuits?

generalise the circuit approach to queries over annotated databases (FAQ and AJAR queries)

recent works (Zhao, Fan, Ouyang, Koutris, 2023)