CRDTs: Consistency without concurrency control
Mihai Le?ia (ENS Lyon & LIP6), Nuno Preguiça (Univ. Nova de Lisboa) and Marc Shapiro (INRIA & LIP6)
—————————
Problem: different order of messages in different replicas
unless the operations are commutative
Use tree to represent the order of operations
Garbage collection: we need to rebalance the tree once in a while
- Rebalance changes the order and hence the identities of the items
– Requires consensus
Consensus requires small, stable membership
- what about large communities?
Solutions:
- Core: group membership, small, stable, rebalancing via 2-phase commit
- Nebula: communicate with sites in same epoch only, join core epoch
– catchup to rebalance,
– Move the extra updates aside, do the received rebalance from the core, and then send the extra updates back to the core
Q: how is different from operational transform?
A: They are not based on a decent theory
Q: Time to converge?
A: in order of ms, it supports eventual consistency
Provenance as First Class Cloud Data
Kiran-Kumar Muniswamy-Reddy (Harvard) and Margo Seltzer (Harvard)
———————
Provenance: meta-data of object history
- used in scientific reproducibility, business compliance, and security
Why support it in Cloud?
- uses want it
- …
Application: Cloud Search is becoming more important as more data in put in the cloud
- unlike web search, the are no links between objects in cloud
- provenance dependencies to refine search
Another application: pre-fetching
- use provenance DAG to identify related object and pre-fetch them
Application: Access control
- …
Application: detect application anomalies
- provide a model of normal usage/behavior
- provider cloud alert users to overuse
Requirements:
- consistency
- long-term persistence
- queryable
- security
- coordinate storage and computing facilities
Q: how is it useful for pre-fetching
A: work on particular scenarios
Cassandra - A Decentralized Structured Storage System
Avinash Lakshman (Facebook) and Prashant Malik (Facebook)
———————
lots of data with mostly random read and write
Design goals:
- high availability
- eventual consistency
- incremental scalability
- optimistic replication
- “Knobs” to tune tradeoffs between consistency, durability, ad latency
- minimal administration
Data Model:
- Keys, columns, and super-columns
- Columns are added and modified dynamically
Write properties;
- no locks in the critical path
- sequential disk access
- behaves like a write back cache
- append support without read ahead
- atomicity guarantee for a key per replica
…
Lessons:
- Add fancy features only when it is necessary
- Many types of failures are possible
- Big systems need proper system-level monitoring
Q: how to compare your approach with big-tables and so on?
A: big-table is for Google, we needed our own system.
Q: limits?
A: membership, we use gossiping now.
Towards Decoupling Storage and Computation in Hadoop with SuperDataNodes
George Porter (Sun Labs)
———————
Facebook imports 25 TB/day to 1K Hadoop nodes
Key of success: coupling compute and storage
- benefits of moving computation to data
- scheduling, locality, …
When to decide to go for coupling?
- your particular data center might not be designed to be efficient for coupling
- mixture may change over time: non-uniform access to data
Goal: late binding between storage and computation
- explore the alternatives
Approach:
- stateless worker tier
- …
Advantages:
- decouple amount of storage from number of worker nodes
- more intra-rack BW than inter-rack BW
- support for “archival” data
- increased uniformity for job scheduling and block placement
- replication only for node failures
- ease of management: because of stateless worker class
Cons:
- scarce storage BW between worker and SDN
- effect on fault-tolerance
- cost
- performance depending on the workload