The Youtopia Project

General information

We create declarative, easy-to-use languages for specifying and solving coordination problems as they increasingly occur in social Web applications. We integrate these languages with database languages such as SQL to support the generalization of database transactions to selectively give up isolation to allow for coordination among database applications and users.

This is a joint project with Cornell University.

More information can be found at http://www.cs.cornell.edu/bigreddata/youtopia/.

 

Publications

The Homeostasis Protocol: Avoiding Transaction Coordination Through Program Analysis

S. Roy; L. Kot; G. Bender; B. Ding; H. Hojjat et al.

2015. SIGMOD 2015 , Melbourne, Victoria, Australia , 31 05 - 04 06 2015. p. 1311-1326.

DOI : 10.1145/2723372.2723720.

Datastores today rely on distribution and replication to achieve improved performance and fault-tolerance. But correctness of many applications depends on strong consistency properties—something that can impose substantial overheads, since it requires coordinating the behavior of multiple nodes. This paper describes a new approach to achieving strong consistency in distributed systems while minimizing communication between nodes. The key insight is to allow the state of the system to be inconsistent during execution, as long as this inconsistency is bounded and does not affect transaction correctness. In contrast to previous work, our approach uses program analysis to extract semantic information about permissible levels of inconsistency and is fully automated. We then employ a novel homeostasis protocol to allow sites to operate independently, without communicating, as long as any inconsistency is governed by appropriate treaties between the nodes. We discuss mechanisms for optimizing treaties based on workload characteristics to minimize communication, as well as a prototype implementation and experiments that demonstrate the benefits of our approach on common transactional benchmarks.

Quantum Databases

S. Roy; L. Kot; C. Koch

2013. CIDR , Asilomar, CA, USA , January 2013.

Fine-grained disclosure control for app ecosystems

G. Bender; L. Kot; J. Gehrke; C. Koch

2013. SIGMOD , New York, NY, USA , June 22-27, 2013. p. 869-880.

The modern computing landscape contains an increasing number of app ecosystems, where users store personal data on platforms such as Facebook or smartphones. APIs enable third-party applications (apps) to utilize that data. A key concern associated with app ecosystems is the confidentiality of user data. In this paper, we develop a new model of disclosure in app ecosystems. In contrast with previous solutions, our model is data-derived and semantically meaningful. Information disclosure is modeled in terms of a set of distinguished security views. Each query is labeled with the precise set of security views that is needed to answer it, and these labels drive policy decisions. We explain how our disclosure model can be used in practice and provide algorithms for labeling conjunctive queries for the case of single-atom security views. We show that our approach is useful by demonstrating the scalability of our algorithms and by applying it to the real-world disclosure control system used by Facebook.

Entangled queries: enabling declarative data-driven coordination

N. Gupta; L. Kot; S. Roy; G. Bender; J. Gehrke et al.

2011. SIGMOD Conference , Athens, Greece , 2011. p. 673-684.

DOI : 10.1145/2338626.2338629.

Many data-driven social and Web applications involve collaboration and coordination. The vision of Declarative Data-Driven Coordination (D3C), proposed in Kot et al. [2010], is to support coordination in the spirit of data management: to make it data-centric and to specify it using convenient declarative languages. This article introduces entangled queries, a language that extends SQL by constraints that allow for the coordinated choice of result tuples across queries originating from different users or applications. It is nontrivial to define a declarative coordination formalism without arriving at the general (NP-complete) Constraint Satisfaction Problem from AI. In this article, we propose an efficiently enforceable syntactic safety condition that we argue is at the sweet spot where interesting declarative power meets applicability in large-scale data management systems and applications. The key computational problem of D3C is to match entangled queries to achieve coordination. We present an efficient matching algorithm which statically analyzes query workloads and merges coordinating entangled queries into compound SQL queries. These can be sent to a standard database system and return only coordinated results. We present the overall architecture of an implemented system that contains our evaluation algorithm. We also describe a proof-of-concept Facebook application we have built on top of this system to allow friends to coordinate flight plans. Finally, we evaluate the performance of the matching algorithm experimentally on realistic coordination workloads.

Coordination through querying in the Youtopia system

N. Gupta; L. Kot; G. Bender; S. Roy; J. Gehrke et al.

2011. SIGMOD Conference , Athens, Greece , 2011. p. 1331-1334.

Entangled Transactions

N. Gupta; M. Nikolic; S. Roy; G. Bender; L. Kot et al.

2011. VLDB , 2011.

As the world becomes more interdependent and computing grows more collaborative, there is a need for new abstractions and tools to help users work together. We recently introduced entangled queries – a mechanism for information exchange between database queries [6]. In this paper, we introduce entangled transactions, units of work similar to traditional transactions that however do not run in isolation, but communicate with each other via entangled queries. Supporting entangled transactions brings about many new challenges, from an abstract model to an investigation of the unique systems issues that arise during their implementation. We first introduce a novel semantic model for entangled transactions that comes with analogues of the classical ACID properties. We then discuss execution models for entangled transactions and select a concrete design motivated by application scenarios. With a prototype system that implements this design, we show experimental results that demonstrate the viability of entangled transactions in real-world application settings.

Beyond isolation: research opportunities in declarative data-driven coordination

L. Kot; N. Gupta; S. Roy; J. Gehrke; C. Koch

Sigmod Record. 2010.

There are many database applications that require users to coordinate and communicate. Friends want to coordinate travel plans, students want to jointly enroll in the same set of courses, and busy professionals want to coordinate their schedules. These tasks are difficult to program using existing abstractions provided by database systems because in addition to the traditional ACID properties provided by the system they all require some type of coordination between users. This is fundamentally incompatible with isolation in the classical ACID properties. In this position paper, we argue that it is time for the database community to look beyond isolation towards principled and elegant abstractions that allow for communication and coordination between some notion of (suitably generalized) transactions. This new area of declarative data-driven coordination (D3C) is motivated by many novel applications and is full of challenging research problems. We survey existing abstractions in database systems and explain why they are insufficient for D3C, and we outline a plethora of exciting research problems.

Cooperative Update Exchange in the Youtopia System

L. Kot; C. Koch

PVLDB. 2009.

Youtopia is a platform for collaborative management and integration of relational data. At the heart of Youtopia is an update exchange abstraction: changes to the data propagate through the system to satisfy user-specified mappings. We present a novel change propagation model that combines a deterministic chase with human intervention. The process is fundamentally cooperative and gives users significant control over how mappings are repaired. An additional advantage of our model is that mapping cycles can be permitted without compromising correctness. We investigate potential harmful interference between updates in our model; we introduce two appropriate notions of serializability that avoid such interference if enforced. The first is very general and related to classical final-state serializability; the second is more restrictive but highly practical and related to conflict-serializability. We present an algorithm to enforce the latter notion. Our algorithm is an optimistic one, and as such may sometimes require updates to be aborted. We develop techniques for reducing the number of aborts and we test these experimentally.