Oliver Irwin
14/11/2023 - Speed-Dating Seminar
Join Query : \(Q(x_1, \dots, x_n) = \bigwedge_{i=1}^k R_i(\vec{z_i})\)
where \(\vec{z_i}\) is a tuple over \(X = \{x_1,\dots,x_n\}\)
Example: \(Q(city, country, name, id) = People(id, name, city) \wedge Capitals(city, country)\)
id | name | city |
---|---|---|
1 | Alice | Paris | 2 | Bob | Lens |
3 | Chiara | Rome |
4 | Djibril | Berlin |
5 | Émile | Dortmund |
6 | Francesca | Rome |
city | country |
---|---|
Berlin | Germany |
Paris | France |
Rome | Italy |
city | country | name | id |
---|---|---|---|
Paris | France | Alice | 1 |
Rome | Italy | Chiara | 3 |
Berlin | Germany | Djibril | 4 |
Rome | Italy | Francesca | 6 |
We want to access the \(k\)-th element of \(Q(\mathbb{D})\) for a given order.
Make \(Q(\mathbb{D})\) an array, sort it and then we have direct access?
We want to access the \(k\)-th element of \(Q(\mathbb{D})\) for a given order.
Make \(Q(\mathbb{D})\) an array, sort it and then we have direct access?
city | country | name | id |
---|---|---|---|
Berlin | Germany | Djibril | 4 |
Paris | France | Alice | 1 |
Rome | Italy | Chiara | 3 |
Rome | Italy | Francesca | 6 |
\(Q(\mathbb{D})[4] = (Rome, Italy, Francesca, 6)\)
We want to access the \(k\)-th element of \(Q(\mathbb{D})\) for a given order.
Make \(Q(\mathbb{D})\) an array, sort it and then we have direct access?
city | country | name | id |
---|---|---|---|
\(\dots\) | \(\dots\) | \(\dots\) | \(\dots\) |
Berlin | Germany | Djibril | 4 |
\(\dots\) | \(\dots\) | \(\dots\) | \(\dots\) |
Paris | France | Alice | 1 |
\(\dots\) | \(\dots\) | \(\dots\) | \(\dots\) |
Rome | Italy | Chiara | 3 |
Rome | Italy | Francesca | 6 |
\(\dots\) | \(\dots\) | \(\dots\) | \(\dots\) |
\(Q(\mathbb{D})[1432] =\) ??
Precomputation : very costly
Access : nearly free
We need another way to represent the data
Research focuses on algorithms with reasonable preprocessing and fast access time
FO logical formulas
Bagan, Durand, Grandjean, Olive (2008)
Preprocessing : | linear |
Access : | constant |
MSO formulas
Bagan (2009)
Preprocessing : | linear |
Access : | constant |
Acyclic CQs
Carmeli, Zeevi, Berkholz, Kimelfeld, Schweikart (2020)
Preprocessing : | linear |
Access : | polylog |
for certain lexicographical orders
We propose a new, unifying method for DA on ACQ based on relational circuits
We want to access the \(k\)-th solution to a query for a given database
Make a table, sort it, and done?
city | country | name | id |
---|---|---|---|
Berlin | Germany | Djibril | 4 |
Paris | France | Alice | 1 |
Rome | Italy | Chiara | 3 |
Rome | Italy | Francesca | 6 |
We need another solution 😢
Approaches with ok preprocessing and fast access exist
But only for restricted queries and databases
😢
We propose a new approach!
Unifies former results 🥳
Extends considered query classes
😍
\(x_1\) | \(x_2\) | \(x_3\) |
---|---|---|
0 | 0 | 0 |
0 | 0 | 1 |
0 | 1 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 0 | 2 |
1 | 1 | 1 |
1 | 1 | 2 |
1 | 2 | 0 |
1 | 2 | 1 |
2 | 0 | 1 |
2 | 0 | 2 |
2 | 2 | 1 |
2 | 2 | 2 |
factorised representation of relations
circuit with 3 kinds of gates :
paths from decision gates are labelled by the domain values
factorised representation of relations
circuit with 3 kinds of gates :
paths from decision gates are labelled by the domain values
+ order \(\prec\) on the variables
For \(C\) an ordered relational circuit, we can perform direct access tasks in time \(\mathcal{O}(\mathsf{poly}(|X|)\mathsf{polylog}(|D|)\) after a preprocessing in time \(\mathcal{O}(|C|\cdot\mathsf{poly}(|X|)\mathsf{polylog}(|D|))\)
Idea : for each gate \(v\) over \(x_i\) and for each domain value \(d\)
compute the size of the relation where \(x_i\) is set to a value \(d'\leqslant d\)
Compute the 7th solution \(\to\) 111
Compute the 13th solution \(\to\) 221
\(Q\) a CQ and \(x_1\prec\dots\prec x_n\) an order over the variable set
\(Q(\mathbb{D}) = \biguplus_{d\in D} Q[x_1 = d](\mathbb{D})\)
\[ \text{if} \begin{cases} Q & = & Q_1 \land Q_2 \\ \mathsf{var}(Q_1) \cap \mathsf{var}(Q_2) & = & \emptyset \end{cases} \]
then \(Q(\mathbb{D}) = Q_1(\mathbb{D}) \times Q_2(\mathbb{D})\)
\(Q\) a CQ and \(x_1\prec\dots\prec x_n\) an order over the variable set
\(Q(\mathbb{D}) = \biguplus_{d\in D} Q[x_1 = d](\mathbb{D})\)
\[ \text{if} \begin{cases} Q & = & Q_1 \land Q_2 \\ \mathsf{var}(Q_1) \cap \mathsf{var}(Q_2) & = & \emptyset \end{cases} \]
then \(Q(\mathbb{D}) = Q_1(\mathbb{D}) \times Q_2(\mathbb{D})\)
recursive implementation + cache \(\implies\) ordered relational circuit computing \(Q(\mathbb{D})\)
Our method:
For a query \(Q(x_1,\dots,x_n)\) and an order on the variables of “complexity” \(k\), we can solve DA tasks with a preprocessing in time \(\mathcal{O}(|\mathbb{D}|^k\mathsf{poly}(|Q|))\) and access in time \(\mathcal{O}(\mathsf{poly}(|Q|)\mathsf{polylog}(|\mathbb{D}|))\).
Signed Conjunctive Queries : \(Q = \bigwedge R_i(\vec{x_i})\) \(\bigwedge \lnot S_j(\vec{x_j})\)
Signed Conjunctive Queries : \(Q = \bigwedge R_i(\vec{x_i})\) \(\bigwedge \lnot S_j(\vec{x_j})\)
\(x_1\) | \(x_2\) | \(x_3\) |
---|---|---|
0 | 1 | 0 |
\(x_1\) | \(x_2\) | \(x_3\) |
---|---|---|
0 | 0 | 0 |
0 | 0 | 1 |
0 | 1 | 1 |
1 | 0 | 0 |
1 | 0 | 1 |
1 | 1 | 0 |
1 | 1 | 1 |
the circuit approach recovers the known tractable classes from the literature (for CQ+)
we generalise and unify tractability results about CQ-
Going further with circuits
study the tractability of the circuit approach for DA on CQs with aggregation
\(Q(p, c, g, \mathsf{count()}) = \mathsf{Teams}(p, c) \land \mathsf{Games}(g, c, \cdot) \land \mathsf{Tries}(g, p)\)
How should we integrate the aggregation in the lexicographical order?
How does the aggregation fit in to the compiled circuits?
generalise the circuit approach to queries over annotated databases (FAQ and AJAR queries)
recent (Oct. 23) works at Madison University (WI, USA)