Research ‒ DATA ‐ EPFL

Research themes and interests

Agent-based Simulations (2018-). The computer science and data science of predicting and shaping the future. Also: epidemiology. Please contact Christoph Koch in case of questions.
Engineered languages. Currently, hardware specialization, domain specific languages (DSLs), and hardware-software codesign receive much interest because many people think of them as the most promising solution to our data center energy and big data problems. The database community has been the pioneer in this space, and SQL is arguably the most successful DSL in existence. We leverage our background and study the design and engineering of languages for a purpose — powerful enough, but only as powerful and expensive as necessary. Projects: DBToaster, Legorithmics, Squid.

Complexity theory, logic, and queries. We perform theoretical research on the foundations of query processing and query languages. Specifically we study the complexity of database queries, which guides our development of efficient algorithms and helps to create an understanding of the fundamental limits to efficient query processing. Projects: ALGILE
Incremental and online computation. We are interested in algorithms and systems for producing (partial) results early. Our work spans the spectrum from online algorithms view incremental view maintenance techniques to low-latency systems. Projects: DBToaster, LINVIEW, Squall

Compilers meet databases. We work on applying ideas from programming languages and compilers to build better data management systems. Projects/systems: DBToaster, SC, DBLab, Legorithmics, Squid and dbStage.
Analytics, machine learning, and managing uncertainty. Projects: LINVIEW, DBLab, DBToaster, MayBMS.
Scalable and massively parallel query processing. Like everyone else, we are interested in scalable processing of big data. Our particular angle is dictated by particular strengths and interests listed above. Projects: Squall, LINVIEW, Openplum, DBToaster.

Projects

Squid (2015–2020). type-safe metaprogramming/compiler framework for Scala. See this page.
dbStage (2017–2020). Language-integrated, modular database system using Squid. See this page.
DBToaster: aggressive compilation for incremental data processing (2009–). In the DBToaster project, we develop aggressive compilation techniques for database query processing. Our techniques are based on highly efficient incremental query evaluation techniques. We also work on generating massively parallel data management systems based on lightweight components for data analysis in the cloud. Acknowledgments: This is joint work with our alumni at Johns Hopkins University and the University at Buffalo. See http://www.dbtoaster.org.
LINVIEW: automatically incremental iterated linear algebra (2013 –2018). go here.

DBLab: building the fastest possible database engines in high-level languages (2013 –2020). go here. DBLab makes use of the SC DSL compiler framework which we develop in the lab, see https://github.com/epfldata/sc-public.
Legorithmics: synthesis of software systems (2011–2020). In this project, we automatically synthesize components of data management systems, such as efficient out-of-core algorithms and concurrency-control algorithms. (Project homepage)

Completed Projects

MayBMS (2005-2011). We studied the management and processing of uncertain and probabilistic data and developed the MayBMS probabilistic database management system. This system extends the PostgreSQL codebase and is available open source. Go to the project website, which contains both our publications and the code.

Openplum (2012/13) is a scalable parallel database system based on PostgreSQL that is inspired by the design of the Greenplum database system. This started as a course project in our Advanced Databases course and was expanded and open-sourced. Get it from GitHub.

MARVEL: computational materials science (2014–2017). MARVEL is a Swiss National Research Center on computational materials science. We work on using DSL compiler techniques for optimizing high-performance computing code and on advanced analytics for materials science.

ALGILE: Algebraic techniques for agile data management systems (2012–2017, ERC project 279804). This is an umbrella project that subsumes DBToaster and Legorithmics, but goes beyond this. Several other subprojects are currently in an incubation stage.

Squall: an online, massively parallel query engine (2010–2017). Squall is based on Apache Storm and has been open-sourced. See https://github.com/epfldata/squall.

Youtopia: optimizing coordination and synchronization in distributed (data management) systems (2009–2012). In the Youtopia Project, we design declarative, easy-to-use languages for specifying and solving coordination problems as they increasingly occur in social Web applications. We create systems that support the generalization of database transactions to selectively give up isolation to allow for coordination among database applications and users.