Branching and Circuits:
Algorithmic Techniques for Efficient Query Evaluation

PhD Defense

Oliver Irwin

oliver.irwin@univ-lille.fr

Université de Lille

11 March 2026

Setting the Stage

Databases are a common way to store information (data)

The main idea is to organise the data so it’s useful

We can retrieve information from databases with queries

Databases and Queries

People
name	city
Alice	Paris
Bob	Middlesbrough
Colin	Paris
Daniel	Berlin
Erik	Dortmund
Francesca	Rome

Relation

Capitals
city	country
Berlin	Germany
Paris	France
Rome	Italy

Tuple

name, city, country variables

Databases and Queries

“Who lives in a capital city?”

“Who, from People, lives in a city that is in Capitals”

\(Q(\mathsf{name, city, country}) \mathop{\mathrm{=}}\mathit{People}(\mathsf{name, city}) \land \mathit{Capitals}(\mathsf{city, country})\)

People
name	city
Alice	Paris
Bob	Middlesbrough
Colin	Paris
Daniel	Berlin
Erik	Dortmund
Francesca	Rome

Capitals
city	country
Berlin	Germany
Paris	France
Rome	Italy

\(Q(\mathbb{D})\)
name	city	country
Alice	Paris	France
Colin	Paris	France
Daniel	Berlin	Germany
Francesca	Rome	Italy

Join Query : \(Q(x_1, \dots, x_n) \mathop{\mathrm{=}}\bigwedge_{i=1}^k \mathsf{R}_i(\vec{z_i})\)

Query Evaluation

Query Evaluation finding all the answers to a query for a given database

With joins, the size of the output can be much larger than the size of the input

R	x
	0
	1

S	y
	0
	1

\(Q(\mathbb{D})\)	x	y
	0	0
	0	1
	1	0
	1	1

\(\mathcal{O}(n)\) input leads to \(\mathcal{O}(n^2)\) output in this case.

Evaluation Problems

Various query evaluation problems exist:

Evaluation / Enumeration: return all of the answers of the query for a given database

Counting: how many answers to the query are there?

Uniform Sampling: get one answer of the query sampled uniformly at random

For some problems, materialising the whole answer set is not necessary

Focus on Direct Access

Direct Access: Given an order over the answers and an index \(k\), return the \(k^\mathsf{th}\) answer

Make \(Q(\mathbb{D})\) an array, sort it and then we have direct access?

\(Q(\mathbb{D})\)
name	city	country
Alice	Paris	France
Colin	Paris	France
Daniel	Berlin	Germany
Francesca	Rome	Italy

\(Q(\mathbb{D})[3] = (\mathsf{Daniel, Berlin, Germany})\)

\(Q(\mathbb{D})[1432] =\)

Precomputation : very costly

Access : nearly free

We need another way to represent the data

Complexity of Evaluation Problems

Query evaluation is a hard problem.

In fact, even simply knowing if there is a solution is NP-hard (Chandra and Merlin (1977))

Thus, we try to identify tractable subclasses:

based on structural properties of the query: acyclicity, bounded width…
or constraints on the database: cardinality constraints, degree constraints, functional dependencies…

A Simple Worst-Case Optimal Join

Joining: the Classical Way

\(Q \mathop{\mathrm{=}}\mathsf{R}(x, y) \wedge \mathsf{S}(x, z) \wedge \mathsf{T}(y, z)\)

R	x	y
	0	0
	0	1
	2	1

S	x	z
	0	0
	0	2
	2	3

T	y	z
	0	2
	1	0
	1	2

Devise a query plan:

\((\mathsf{R} \wedge \mathsf{S})\) \(\wedge\; \mathsf{T}\) \(\;\) or \(\; (\mathsf{R} \wedge \mathsf{T})\) \(\wedge\; \mathsf{S}\) or \(\; (\mathsf{S} \wedge \mathsf{T})\) \(\wedge\; \mathsf{R}\)

Materialise intermediate joins

x	y	z
0	0	0
0	0	2
0	1	0
0	1	2
2	1	3

x	y	z
0	0	0
0	0	2
0	1	0
0	1	2
2	1	3

R \(\land\) S \(\land\) T	x	y	z
	0	0	2
	0	1	0
	0	1	2

The Issue with the Classical Way

\[Q \mathop{\mathrm{=}}\mathsf{R}(x, y) \wedge \mathsf{S}(x, z) \wedge \mathsf{T}(y, z)\]

It is known that if \(|\mathsf{R}|, |\mathsf{S}|, |\mathsf{T}| \leqslant N\), then \(|Q(\mathbb{D})| \leqslant N^{1.5}\).
\(\mathsf{R} \land \mathsf{S}\) may have \(N^2\) answers!

Worst possible scenario for query plans:

Consider \(\mathbb{D}\) on domain \(D = D_1 \uplus D_2 \uplus D_3\) with:

\(0 \notin D\)
\(|D_1|=|D_2|=|D_3|=N\).

R	x	y
	0	\(D_2\)
	\(D_1\)	0

S	x	z
	0	\(D_3\)
	\(D_1\)	0

T	y	z
	0	\(D_3\)
	\(D_2\)	0

\(|\mathsf{R} \land \mathsf{S}| \geqslant N^2\), \(|\mathsf{R} \land \mathsf{T}| \geqslant N^2\), \(|\mathsf{S} \land \mathsf{T}| \geqslant N^2\)
However, \(|Q(\mathbb{D})| = 0\).

Complexity of Evaluation

\(Q \mathop{\mathrm{=}}\mathsf{R}(x, y) \wedge \mathsf{S}(x, z) \wedge \mathsf{T}(y, z)\)

Important factors in complexity computation:

input size (mainly the database)

output size (number of answers)

Complexity is at least linear in the output size

look at the worst possible output size

look at structural properties to simplify

Worst-Case

Consider a join query \(Q\) and all databases for \(Q\) with a bound \(N\) on relation size:

\[ \mathcal{C}[\leqslant~N]= \{\mathbb{D}\mid \forall \mathsf{R} \in Q, |\mathsf{R}| \leqslant N\} \]

We define \(\mathsf{wc}(Q,N)\) as the worst case: the size of the biggest answer set possible with query \(Q\) over databases where relation sizes are bounded by \(N\).

\[ \mathsf{wc}(Q, N) = \mathsf{sup}_{\mathbb{D}\in\mathcal{C}[\leqslant~N]}~|Q(\mathbb{D})| \]

We know how to compute \(\rho(Q)\) such that \(\mathsf{wc}(Q,N) = \tilde{\mathcal{O}}(N^{\rho(Q)})\) AGM-bound

Worst-Case Optimal Joins (WCOJ)

A join algorithm is worst-case optimal (wrt \(\mathcal{C}[\leqslant~N]\)) if, for every \(Q\), \(N \in \mathbb{N}\) and \(\mathbb{D}\in \mathcal{C}[\leqslant~N]\), it computes \(Q(\mathbb{D})\) in time: \[\tilde{\mathcal{O}}(f(|Q|) \cdot \mathsf{wc}(Q,N))\]

The DBMS approach is not worst-case optimal (triangle example from before).

Rich literature:

NPRR Join (Ngo et al. 2012), Leapfrog Triejoin (Veldhuizen 2014), Generic Join (Ngo 2018), PANDA (Abo Khamis, Ngo, and Suciu 2017)

Joining: an Alternative Solution

\(Q \mathop{\mathrm{=}}\mathsf{R}(x, y) \wedge \mathsf{S}(x, z) \wedge \mathsf{T}(y, z)\)

R	x	y
	0	0
	0	1
	2	1

S	x	z
	0	0
	0	2
	2	3

T	y	z
	0	2
	1	0
	1	2

A Simple Branching Algorithm

What about the complexity of this algorithm?

Complexity Analysis

Number of calls

blue and red paths = partial assignments

at most \(\mathsf{wc}(Q, N)\) blue paths
a red path starts with a blue part
at most \(\mathsf{wc}(Q, N)\) blue starts \(|\mathsf{dom}|\) possibilities

\(n\) variables to assign

Cost per call

branch a variable on a domain value + filter \(m\) relations

\(\mathcal{O}(m \cdot \mathsf{log}|N|)\)

Algorithm Complexity

The complexity of the branching algorithm is:

\[ \mathcal{O}(m\cdot \mathsf{log}|N| \cdot |\mathsf{dom}|\cdot n \cdot \mathsf{wc}(Q, N)) \]

\[ \tilde{\mathcal{O}}(nm \cdot |\mathsf{dom}| \cdot \mathsf{wc}(Q, N)) \]

Not WCOJ yet…

Reducing the domain size

Consider \(b = 3\) bits

\(\mathsf{R}\)	x	y
	1	2
	2	1
	3	0

⇝

\(x^1\)	\(x^0\)	\(y^1\)	\(y^0\)
0	1	1	0
1	0	0	1
1	1	0	0

\(Q\) ⇝ \(\tilde{Q}^b\) has \(bn\) variables
\(\mathbb{D}\) ⇝ \(\tilde{\mathbb{D}}^b\) for \(b = \log |\mathsf{dom}|\). Database has roughly the same bitsize but size \(2\) domain!

Reducing the domain size

Finally, a WCOJ!

The complexity of the branching algorithm is:

The complexity of the binarised version of the branching algorithm is:

\[ \tilde{\mathcal{O}}(nm \cdot |\mathsf{dom}| \cdot \mathsf{wc}(Q, N)) \]

\[ \tilde{\mathcal{O}}(n \cdot \mathsf{log}_2(|\mathsf{dom}|) \cdot m \cdot 2 \cdot \mathsf{wc}(Q, N)) \]

\[ \tilde{\mathcal{O}}(nm \cdot \mathsf{wc}(Q, N)) \]

Here, we binarise, but we could use any convenient fixed-sized domain

What We’ve Seen so Far

Evaluating Join Queries

Joining queries with a join plan is not always optimal

By using a conceptually simple branching method, we can build a WCOJ algorithm

Proof of the complexity of the algorithm does not assume knowledge of the actual worst-case

Literature results matched, also for acyclic degree constraints and simple functional dependencies

Related Work

The trace tree from our join algorithm allows for an efficient sampling algorithm

Towards Succinct Representations

Ordered Relational Circuits

From JQ to Circuit

\(Q\) a JQ and \(x_1\prec\dots\prec x_n\) an order over the variable set

\(Q(\mathbb{D}) = \biguplus_{d\in \mathsf{D}} Q[x_1 = d](\mathbb{D})\)

\[ \text{if} \begin{cases} Q & = & Q_1 \land Q_2 \\ \mathsf{var}(Q_1) \cap \mathsf{var}(Q_2) & = & \emptyset \end{cases} \]

then \(Q(\mathbb{D}) = Q_1(\mathbb{D}) \times Q_2(\mathbb{D})\)

a cache to factorise gates

From JQ to Circuit

\(Q\) a CQ and \(x_1\prec\dots\prec x_n\) an order over the variable set

\(Q(\mathbb{D}) = \biguplus_{d\in \mathsf{D}} Q[x_1 = d](\mathbb{D})\)

\[ \text{if} \begin{cases} Q & = & Q_1 \land Q_2 \\ \mathsf{var}(Q_1) \cap \mathsf{var}(Q_2) & = & \emptyset \end{cases} \]

then \(Q(\mathbb{D}) = Q_1(\mathbb{D}) \times Q_2(\mathbb{D})\)

a cache to factorise gates

Compilation Complexity

The complexity of the compilation of a query to a relational circuit directly depends on the order we set on the variables.

The complexity of an order can be measured with hypergraph measures, notably:

the fractional hyperorder width – \(\mathsf{fhow}(\mathcal{H}, \prec)\)

matches existing measures: incompatibility number (BCM (2022)) + related to fractional hypertree width

For a query \(Q\), a database \(\mathbb{D}\) on domain \(\mathsf{dom}\) and an order \(\prec\) on the variables of \(Q\), we can build an ordered relational circuit:

of size \(\mathcal{O}(|Q|^{\mathcal{O}(1)} \cdot |\mathbb{D}|^k \cdot |\mathsf{dom}|)\)

in time \(\tilde{\mathcal{O}}(|Q|^{\mathcal{O}(1)} \cdot |\mathbb{D}|^k \cdot |\mathsf{dom}|)\)

with \(k = \mathsf{fhow}(\mathcal{H}(Q), \prec)\)

Adding Negation to Queries

Signed Join Query: \(Q(x_1, \dots, x_n) = \bigwedge_{i=1}^k\) \(P_i(\vec{z_i})\) \(\land \bigwedge_{i=1}^k\) \(\lnot N_i(\vec{z_i})\)

Natural extension! ex: “Who are the People that do not live in a Capital?”

Big difference:

positively encoding \(\lnot N(\vec{z})\) on a domain \(D\) requires \((D^{|\vec{z}|} - \#N)\) tuples

\(N_i\)	\(x_1\)	\(x_2\)	\(x_3\)
	0	1	0

\(x_1\)	\(x_2\)	\(x_3\)
0	0	0
0	0	1
0	1	1
1	0	0
1	0	1
1	1	0
1	1	1

Tractability of SJQs

\(Q_1 = R(x, y, z) \land S(x, y) \land T(x, z) \land U(y, z)\)

has linear compilation time

\(Q_1 =\) \(\lnot R(x, y, z)\) \(\land S(x, y) \land T(x, z) \land U(y, z)\)

non-linear compilation time (DB w/ empty relation)

\(Q_2 = S(x, y) \land T(x, z) \land U(y, z)\)

non-linear compilation time (triangle)

query should be as hard as any subquery obtained by removing negative atoms

Refining the Measure

Compilation time for an SJQ \(Q\) should be at least that of any \(Q' \subseteq Q\)

The signed fractional hyperorder width for a hypergraph \(\mathcal{H}\) and an order \(x_1 \prec \dots \prec x_n\) is:

\[ \mathsf{sfhow}(\mathcal{H}, \prec) = \mathsf{max}_{\mathcal{H}' \subseteq \mathcal{H}}\mathsf{fhow}(\mathcal{H}', \prec) \]

In the positive case, recovers the value from previous measures, such as the fractional hyperorder width

Compilation Complexity 2.0

For a query \(Q\), a database \(\mathbb{D}\) on domain \(\mathsf{dom}\) and an order \(\prec\) on the variables of \(Q\), we can build an ordered relational circuit:

of size \(\mathcal{O}(|Q|^{\mathcal{O}(1)} \cdot |\mathbb{D}|^k \cdot |\mathsf{dom}|)\)

in time \(\tilde{\mathcal{O}}(|Q|^{\mathcal{O}(1)} \cdot |\mathbb{D}|^k \cdot |\mathsf{dom}|)\)

with \(k = \mathsf{fhow}(\mathcal{H}(Q), \prec)\)

For a signed query \(Q\), a database \(\mathbb{D}\) on domain \(\mathsf{dom}\) and an order \(\prec\) on the variables of \(Q\), we can build an ordered relational circuit:

of size \(\mathcal{O}(|Q|^{\mathcal{O}(k)} \cdot |\mathbb{D}|^k \cdot |\mathsf{dom}|)\)

in time \(\tilde{\mathcal{O}}(|Q|^{\mathcal{O}(k)} \cdot |\mathbb{D}|^k \cdot |\mathsf{dom}|)\)

with \(k = \mathsf{sfhow}(\mathcal{H}(Q), \prec)\)

Changes in the algorithm:

Remove from the query any unnecessary negated relation;
New cases for inconsistence to check;
Simplification of the query.

In the positive case, this recovers existing approaches

But also extends to signed join queries

Wrapping up Compilation

Query Evaluation is now seen as a compilation task

Structure is similar as our WCOJ, with extra features: factorisation and independant subquery evaluation

Allows the compilation of signed join queries

How can we leverage this structure to answer direct access tasks?

Direct Access over Relational Circuits

Solving Direct Access

Here, we consider that the order on the answers is the lexicographical order induced by the order on the variables and an order on the domain values

Answering direct access tasks will be done in two steps:

a preprocessing phase;
an access phase.

Preprocessing

Transform the query and the database instance into a structure supporting efficient access

AAA

Access

Given an index \(k\), use the structure to return the \(k^{\mathsf{th}}\) tuple efficiently

We will start the preprocessing on the relational circuit structure we introduced

Preprocessing

Direct Access with Relational Circuits

Compute the 7^th solution 111

Direct Access with Relational Circuits

Compute the 13^th solution 221

Complexity of Direct Access

Given a signed join query \(Q\), a database \(\mathbb{D}\) on domain \(\mathsf{dom}\) and an order \(\prec\) on the variables of \(Q\), we can solve the direct access problem with :

preprocessing time: \(\tilde{\mathcal{O}}(|Q|^{\mathcal{O}(k)} \cdot |\mathbb{D}|^k)\)

access time: \(\mathcal{O}(|Q|^{\mathcal{O}(1)} \cdot \mathsf{polylog}|\mathsf{dom}|)\)

with \(k = \mathsf{show}(\mathcal{H}(Q), \prec)\)

Wrapping Up Direct Access

We can reuse the circuit structure we introduced to efficiently answer direct access tasks

This recovers existing results for positive queries: (BCM – (2022), CTGKR – (2023))

And extends these results to queries with negation

We have lower bounds: the circuit approach is optimal

The approach also works for conjunctive queries, if the projected variables are at the end of the order

Conclusion and Future Directions

Main Results

Building factorised representations

Simple WCOJ algorithm tree-like structure

Extended to relational circuits

Retrieve existing results

Using factorised representations

Solve specific problems efficiently: sampling / direct access

Retrieve existing results

Extend efficient direct access to queries with negations

Future Directions

Going further with circuits

Study tractability for direct access for CQs with aggregation: (Eldar, Carmeli, and Kimelfeld (2024))

Generalise the circuit approach to queries over annotated databases, eg. FAQ or AJAR (Zhao et al. (2024))

Evaluating circuits in the context of dynamic databases

Questions

Sampling Answers Uniformly

Problem Statement

Given \(Q\) and \(\mathbb{D}\), sample \(\tau \in Q(\mathbb{D})\) with probability \(\frac{1}{|Q(\mathbb{D})|}\) or fail if \(Q(\mathbb{D}) = \emptyset\).

Naive algorithm:

materialise \(Q(\mathbb{D})\) in a table
sample \(i \leqslant|Q(\mathbb{D})|\) uniformly
output \(Q(\mathbb{D})[i]\).

Complexity using WCOJ:

\(\tilde{\mathcal{O}}(\mathsf{wc}(Q,N)\cdot\mathsf{poly}(|Q|))\).

We can do better: (expected) time \(\tilde{\mathcal{O}}(\frac{\mathsf{wc}(Q,N)}{\mathsf{max}(|Q(\mathbb{D})|,1)} \mathsf{poly}(|Q|))\)

Deng, Lu, and Tao (2023), Kim et al. (2023)

Revisiting the problem

Sampling an answer is sampling one of the leaves

Sampling leaves: the easy way

Green path: \(\mathsf{Pr} = 1 \cdot \frac{1}{3} \cdot 1 = \frac{1}{3}\)

Blue path: \(\mathsf{Pr} = 1 \cdot \frac{2}{3} \cdot \frac{1}{2} = \frac{1}{3}\)

Uniform probability of sampling a leaf

Choose a path in the tree according to the number of interesting leaves under it

Sampling leaves: with an oracle

Green path:

\(\mathsf{Pr} = \frac{\mathsf{upb}(y_1)}{\mathsf{upb}(x)} \cdot \frac{\mathsf{upb}(z_1)}{\mathsf{upb}(y_1)} \cdot \frac{1}{\mathsf{upb}(z_1)} = \frac{1}{\mathsf{upb}(x)}\)

Blue path:

\(\mathsf{Pr} = \frac{\mathsf{upb}(y_1)}{\mathsf{upb}(x)} \cdot \frac{\mathsf{upb}(z_2)}{\mathsf{upb}(y_1)} \cdot \frac{1}{\mathsf{upb}(z_2)} = \frac{1}{\mathsf{upb}(x)}\)

Uniform probability of sampling a leaf

Only makes sense if \(\mathsf{upb}(t) \geqslant\sum_d \mathsf{upb}(t_d)\)

Upper bound oracles for conjunctive queries

Upper bounds on the number of solutions \(\rightarrow\) worst-case bounds!

AGM bound: there exists positive rational numbers \((\lambda_{\mathsf{R}})_{\mathsf{R} \in Q}\) such that \[|Q(\mathbb{D})| \leqslant\prod_{\mathsf{R} \in Q}|\mathsf{R}^\mathbb{D}|^{\lambda_R} \leqslant \mathsf{wc}(Q,N)\]

Define \(\mathsf{upb}(t) = \prod_{\mathsf{R} \in Q}|{\mathsf{\color{green}R}}^\mathbb{D}[\tau_t]|^{\lambda_{\mathsf{R}}} \leqslant\mathsf{wc}(Q,N)\):

it is an upper bound on \(|Q(\mathbb{D})[\tau_t]|\),
it is supperadditive: \(\mathsf{upb}(t) \geqslant \sum_{d \in \mathsf{dom}} \mathsf{upb}(t_d)\)
value of \(\mathsf{upb}\) at the root of the tree: \(\mathsf{wc}(Q,N)\)!

Sampling complexity

For a tree \(T\) rooted in \(r\), \(\mathsf{upb}\) a super-additive leaf estimator and \(\mathsf{out}\) the output of our algorithm. Then for any -leaf \(l\), the algorithm is a uniform Las Vegas sampler with guarantees: \[ \mathsf{Pr}(\mathsf{out} = l) = \frac{1}{\mathsf{upb}(T)} \qquad \mathsf{Pr}(\mathsf{out} = \mathsf{fail}) = 1 - \frac{|\top\mathsf{-leaves}(T)|}{\mathsf{upb}(T)} \]

Given a class of queries \(\mathcal{C}[\leqslant~N]\), for any query \(Q \in \mathcal{C}[\leqslant~N]\), it is possible to uniformly sample from the answer set with expected time \[ \tilde{\mathcal{O}}(\frac{\mathsf{wc}(Q, N)}{\mathsf{max}(1, |\mathsf{ans}(Q)|)} \cdot nm \cdot \mathsf{log}|\mathsf{dom}|) \]

References

Abo Khamis, Mahmoud, Hung Q. Ngo, and Dan Suciu. 2017. “What Do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Have to Do with One Another?” In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 429–44. Chicago Illinois USA: ACM. https://doi.org/10.1145/3034786.3056105.

Bringmann, Karl, Nofar Carmeli, and Stefan Mengel. 2022. “Tight Fine-Grained Bounds for Direct Access on Join Queries.” In PODS ’22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, edited by Leonid Libkin and Pablo Barceló, 427–36. PODS ’22. Association for Computing Machinery. https://doi.org/10.1145/3517804.3526234.

Carmeli, Nofar, Nikolaos Tziavelis, Wolfgang Gatterbauer, Benny Kimelfeld, and Mirek Riedewald. 2023. “Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries.” ACM Transactions on Database Systems, January. https://doi.org/10.1145/3578517.

Chandra, Ashok K., and Philip M. Merlin. 1977. “Optimal Implementation of Conjunctive Queries in Relational Data Bases.” In Proceedings of the Ninth Annual ACM Symposium on Theory of Computing, 77–90. STOC ’77. New York, NY, USA: ACM. https://doi.org/10.1145/800105.803397.

Deng, Shiyuan, Shangqi Lu, and Yufei Tao. 2023. “On Join Sampling and the Hardness of Combinatorial Output-Sensitive Join Algorithms.” In Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 99–111. Seattle WA USA: ACM. https://doi.org/10.1145/3584372.3588666.

Eldar, Idan, Nofar Carmeli, and Benny Kimelfeld. 2024. “Direct Access for Answers to Conjunctive Queries with Aggregation.” In 27th International Conference on Database Theory, ICDT 2024, March 24 to March 28, 2024, Paestum, Italy, edited by Graham Cormode and Michael Shekelyan, 20 pages. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPICS.ICDT.2024.4.

Kim, Kyoungmin, Jaehyun Ha, George Fletcher, and Wook-Shin Han. 2023. “Guaranteeing the Õ(AGM/OUT) Runtime for Uniform Sampling and Size Estimation over Joins.” In Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 113–25. Seattle WA USA: ACM. https://doi.org/10.1145/3584372.3588676.

Ngo, Hung Q. 2018. “Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems.” In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 111–24. Houston TX USA: ACM. https://doi.org/10.1145/3196959.3196990.

Ngo, Hung Q., Ely Porat, Christopher Ré, and Atri Rudra. 2012. “Worst-Case Optimal Join Algorithms: [Extended Abstract].” In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 37–48. Scottsdale Arizona USA: ACM. https://doi.org/10.1145/2213556.2213565.

Veldhuizen, Todd L. 2014. “Triejoin: A Simple, Worst-Case Optimal Join Algorithm.” In Proc. 17th International Conference on Database Theory (ICDT), Athens, Greece, March 24-28, 2014, edited by Nicole Schweikardt, Vassilis Christophides, and Vincent Leroy, 96–106. OpenProceedings.org. https://doi.org/10.5441/002/icdt.2014.13.

Zhao, Hangdong, Austen Z. Fan, Xiating Ouyang, and Paraschos Koutris. 2024. “Conjunctive Queries with Negation and Aggregation: A Linear Time Characterization.” Proc. ACM Manag. Data 2 (2). https://doi.org/10.1145/3651138.

Branching and Circuits:Algorithmic Techniques for Efficient Query Evaluation

Setting the Stage

Setting the Stage

Databases and Queries

Databases and Queries

Query Evaluation

Evaluation Problems

Focus on Direct Access

Complexity of Evaluation Problems

A Simple Worst-Case Optimal Join

Joining: the Classical Way

The Issue with the Classical Way

Complexity of Evaluation

Worst-Case

Worst-Case Optimal Joins (WCOJ)

Joining: an Alternative Solution

A Simple Branching Algorithm

Complexity Analysis

Algorithm Complexity

Reducing the domain size

Reducing the domain size

Finally, a WCOJ!

What We’ve Seen so Far

Towards Succinct Representations

Ordered Relational Circuits

From JQ to Circuit

From JQ to Circuit

Compilation Complexity

Adding Negation to Queries

Tractability of SJQs

Refining the Measure

Compilation Complexity 2.0

Wrapping up Compilation

Direct Access over Relational Circuits

Solving Direct Access

Preprocessing

Direct Access with Relational Circuits

Direct Access with Relational Circuits

Complexity of Direct Access

Wrapping Up Direct Access

Conclusion and Future Directions

Main Results

Future Directions

Questions

Sampling Answers Uniformly

Problem Statement

Revisiting the problem

Sampling leaves: the easy way

Sampling leaves: with an oracle

Upper bound oracles for conjunctive queries

Sampling complexity

References

Branching and Circuits:
Algorithmic Techniques for Efficient Query Evaluation