The goal of this project is to create a toolkit for quickly building highly efficient database systems in a high-level language (Scala) using SC, our easily extensible DSL compiler framework.
SC is publicly available in binary form at https://github.com/epfldata/sc-public (It will be open-sourced in the coming months, as soon as published).
Our first database system implemented as part of DBLAB, LegoBase (an analytical query engine), is available open-source at https://github.com/epfldata/dblab.
- Christoph Koch
- Amir Shaikhha
- Yannis Klonatos
- Lionel Parreaux
- Mohammad Dashti
- Florian Chlan
- Nikos Kokolakis
- Lewis Brown
How to Architect a Query Compiler
2016. SIGMOD 2016 , San Francisco, USA , June 26 - July 1, 2016.
DOI : 10.1145/2882903.2915244.
This paper studies architecting query compilers. The state of the art in query compiler construction is lagging behind that in the compilers field. We attempt to remedy this by exploring the key causes of technical challenges in need of well founded solutions, and by gathering the most relevant ideas and approaches from the PL and compilers communities for easy digestion by database researchers. All query compilers known to us are more or less monolithic template expanders that do the bulk of the compilation task in one large leap. Such systems are hard to build and maintain. We propose to use a stack of multiple DSLs on different levels of abstraction with lowering in multiple steps to make query compilers easier to build and extend, ultimately allowing us to create more convincing and sustainable compiler-based data management systems. We attempt to derive our advice for creating such DSL stacks from widely acceptable principles. We have also re-created a well-known query compiler following these ideas and report on this effort.
Yin-yang: concealing the deep embedding of DSLs
2014. International Conference on Generative Programming: Concepts and Experiences - GPCE 2014 , Västerås, Sweden , 15-16 09 2014. p. 73-82.
DOI : 10.1145/2658761.2658771.
Deeply embedded domain-specific languages (EDSLs) intrinsically compromise programmer experience for improved program performance. Shallow EDSLs complement them by trading program performance for good programmer experience. We present Yin-Yang, a framework for DSL embedding that uses Scala macros to reliably translate shallow EDSL programs to the corresponding deep EDSL programs. The translation allows program prototyping and development in the user friendly shallow embedding, while the corresponding deep embedding is used where performance is important. The reliability of the translation completely conceals the deep em- bedding from the user. For the DSL author, Yin-Yang automatically generates the deep DSL embeddings from their shallow counterparts by reusing the core translation. This obviates the need for code duplication and leads to reliability by construction.
Building Efficient Query Engines in a High-Level Language
2014. 40th International Conference on Very Large Data Bases (VLDB) , Hangzhou, China , September 1st - 5th 2014.
In this paper we advocate that it is time for a radical rethinking of database systems design. Developers should be able to leverage high-level programming languages without having to pay a price in efficiency. To realize our vision of abstraction without regret, we present LegoBase, a query engine written in the high-level programming language Scala. The key technique to regain efficiency is to apply generative programming: the Scala code that constitutes the query engine, despite its high-level appearance, is actually a program generator that emits specialized, low-level C code. We show how the combination of high-level and generative programming allows to easily implement a wide spectrum of optimizations that are difficult to achieve with existing low-level query compilers, and how it can continuously optimize the query engine. We evaluate our approach with the TPC-H benchmark and show that: (a) with all optimizations enabled, our architecture significantly outperforms a commercial in-memory database system as well as an existing query compiler, (b) these performance improvements require programming just a few hundred lines of high-level code instead of complicated low-level code that is required by existing query compilers and, finally, that (c) the compilation overhead is low compared to the overall execution time, thus making our approach usable in practice for efficiently compiling query engines.