We develop aggressive compilation techniques for database query processing. In the DBToaster project, we specifically focus on realizing what we call agile views, which are queries whose results we continuously keep fresh and available as a database undergoes rapid change. Agile views have important applications in large-scale interactive data analysis, database and policy monitoring, Markov Chain Monte Carlo, and automated trading. Our techniques are based on highly efficient incremental query evaluation techniques. We are currently also working on a compiler for generating massively parallel data management systems based on lightweight components for data analysis in the cloud.
This is a joint project with Johns Hopkins University and SUNY Buffalo.
Detailed information can be found at http://www.dbtoaster.org.
How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates
2016. SIGMOD , San Francisco, USA , June 26-July 01, 2016.
DOI : 10.1145/2882903.2915246.
In the quest for valuable information, modern big data applications continuously monitor streams of data. These applications demand low latency stream processing even when faced with high volume and velocity of incoming changes and the user’s desire to ask complex queries. In this paper, we study low-latency incremental computation of complex SQL queries in both local and distributed streaming environments. We develop a technique for the efficient incrementalization of queries with nested aggregates for batch updates. We identify the cases in which batch processing can boost the performance of incremental view maintenance but also demonstrate that tuple-at-a-time processing often can achieve better performance in local mode. Batch updates are essential for enabling distributed incremental view maintenance and amortizing the cost of network communication and synchronization. We show how to derive incremental programs optimized for running on large-scale processing platforms. Our implementation of distributed incremental view maintenance can process tens of million of tuples with few-second latency using hundreds of nodes.
DBToaster: higher-order delta processing for dynamic, frequently fresh views
VLDB Journal. 2014.
DOI : 10.1007/s00778-013-0348-4.
Applications ranging from algorithmic trading to scientific data analysis require real-time analytics based on views over databases receiving thousands of updates each second. Such views have to be kept fresh at millisecond latencies. At the same time, these views have to support classical SQL, rather than window semantics, to enable applications that combine current with aged or historical data. In this article, we present the DBToaster system, which keeps materialized views of standard SQL queries continuously fresh as data changes very rapidly. This is achieved by a combination of aggressive compilation techniques and DBToaster's original recursive finite differencing technique which materializes a query and a set of its higher-order deltas as views. These views support each other's incremental maintenance, leading to a reduced overall view maintenance cost. DBToaster supports tens of thousands of complete view refreshes per second for a wide range of queries.
Agile Views in a Dynamic Data Management System
2011. CIDR 2011, Fifth Biennial Conference on Innovative Data Systems Research , Asilomar, CA, USA , January 9-12, 2011. p. 284-295.
This paper calls for a new breed of lightweight systems - dynamic data management systems (DDMS). In a nutshell, a DDMS manages large dynamic data structures with agile, frequently fresh views, and provides a facility for monitoring these views and triggering application-level events. We motivate DDMS with applications in large-scale data analytics, database monitoring, and high-frequency algorithmic trading. We compare DDMS to more traditional data management systems architectures. We present the DBToaster project, which is an ongoing effort to develop a prototype DDMS system. We describe its architecture design, techniques for high-frequency incremental view maintenance, storage, scaling up by parallelization, and various key challenges to overcome to make DDMS a reality.
Incremental Query Evaluation in a Ring of Databases.
2010. Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010 , Indianapolis, Indiana, USA , June 6-11, 2010.
This paper approaches the incremental view maintenance problem from an algebraic perspective. We construct a ring of databases and use it as the foundation of the design of a query calculus that allows to express powerful aggregate queries. The query calculus inherits key properties of the ring, such as having a normal form of polynomials and being closed under computing inverses and delta queries. The k-th delta of a polynomial query of degree k without nesting is purely a function of the update, not of the database. This gives rise to a method of eliminating expensive query operators such as joins from programs that perform incremental view maintenance. The main result is that, for non-nested queries, each individual aggregate value can be incrementally maintained using a constant amount of work. This is not possible for nonincremental evaluation.
DBToaster: A SQL Compiler for High-Performance Delta Processing in Main-Memory Databases
We present DBToaster, a novel query compilation framework for producing high performance compiled query executors that incrementally and continuously answer standing aggregate queries using in-memory views. DBToaster targets applications that require efficient main-memory processing of standing queries (views) fed by high-volume data streams, recursively compiling view maintenance (VM) queries into simple C++ functions for evaluating database updates (deltas). While today's VM algorithms consider the impact of single deltas on view queries to produce maintenance queries, we recursively consider deltas of maintenance queries and compile to thoroughly transform queries into code. Recursive compilation successively elides certain scans and joins, and eliminates signifcant query plan interpreter overheads. In this demonstration, we walk through our compilation algorithm, and show the signifcant performance advantages of our compiled executors over other query processors. We are able to demonstrate 1-3 orders of magnitude improvements in processing times for a nancial application and a data warehouse loading application, both implemented across a wide range of database systems, including PostgreSQL, HSQLDB, a commercial DBMS 'A', the Stanford STREAM engine, and a commercial stream processor 'B'.
This project is supported by ERC grant 279804 ALGILE.