EuroSys
The European Professional Society on Computer Systems

European chapter of the
Special Interest Group on Operating Systems (SIGOPS)
of the Association for Computing Machinery (ACM)
Home Join or renew membership EuroSys for Students EuroSys for Faculty Job Offers Activities Systems Directory Systems Events and Blog Eastern Europe Initiative Member Area Member News Officers and Volunteers Useful links Press releases

Posts Tagged ‘SOSP 2009: LADIS’

SOSP09: LADIS: Session 5: Communication

Sunday, October 11th, 2009

Bulletin Board: A Scalable and Robust Eventually Consistent Shared Memory over a Peer-to-Peer Overlay
Vita Bortnikov (IBM Research), Gregory Chockler (IBM Research), Alexey Roytman (IBM Research) and Mike Spreitzer (IBM Research)
———————–
Faciliating group sharing of data in data centers

Write-Sub service model:
- pub/sub: topic-based communication
- Shared memory
- Group membership

Reliability: periodic refresh of the latest written value

Optimizing Information Flow in the Gossip Objects Platform
Ymir Vigfusson (Cornell University), Ken Birman (Cornell University), Qi Huang (Cornell University) and Deepak Parasam Nataraj (Cornell University)
———————–
When the number of groups increase the gossiping approach looses its bounded bandwidth property.

Solution: Adding a layer to take care of that

assumptions:
- gossip pkts are small
- can be piggy-backed on other messages
- groups are willing to forward the gossips of unrelated groups

You can guess the approach based on the assumptions ;-)

SOSP09: LADIS: Session 3: Storage

Sunday, October 11th, 2009

CRDTs: Consistency without concurrency control
Mihai Le?ia (ENS Lyon & LIP6), Nuno Preguiça (Univ. Nova de Lisboa) and Marc Shapiro (INRIA & LIP6)
—————————
Problem: different order of messages in different replicas
unless the operations are commutative

Use tree to represent the order of operations

Garbage collection: we need to rebalance the tree once in a while
- Rebalance changes the order and hence the identities of the items
– Requires consensus

Consensus requires small, stable membership
- what about large communities?

Solutions:
- Core: group membership, small, stable, rebalancing via 2-phase commit
- Nebula: communicate with sites in same epoch only, join core epoch
– catchup to rebalance,
– Move the extra updates aside, do the received rebalance from the core, and then send the extra updates back to the core

Q: how is different from operational transform?
A: They are not based on a decent theory

Q: Time to converge?
A: in order of ms, it supports eventual consistency

Provenance as First Class Cloud Data
Kiran-Kumar Muniswamy-Reddy (Harvard) and Margo Seltzer (Harvard)
———————
Provenance: meta-data of object history
- used in scientific reproducibility, business compliance, and security

Why support it in Cloud?
- uses want it
- …

Application: Cloud Search is becoming more important as more data in put in the cloud
- unlike web search, the are no links between objects in cloud
- provenance dependencies to refine search

Another application: pre-fetching
- use provenance DAG to identify related object and pre-fetch them

Application: Access control
- …

Application: detect application anomalies
- provide a model of normal usage/behavior
- provider cloud alert users to overuse

Requirements:
-  consistency
- long-term persistence
- queryable
- security
- coordinate storage and computing facilities

Q: how is it useful for pre-fetching
A: work on particular scenarios

Cassandra - A Decentralized Structured Storage System
Avinash Lakshman (Facebook) and Prashant Malik (Facebook)
———————
lots of data with mostly random read and write

Design goals:
- high availability
- eventual consistency
- incremental scalability
- optimistic replication
- “Knobs” to tune tradeoffs between consistency, durability, ad latency
- minimal administration

Data Model:
- Keys, columns, and super-columns
- Columns are added and modified dynamically

Write properties;
- no locks in the critical path
- sequential disk access
- behaves like a write back cache
- append support without read ahead
- atomicity guarantee for a key per replica

Lessons:
- Add fancy features only when it is necessary
- Many types of failures are possible
- Big systems need proper system-level monitoring

Q: how to compare your approach with big-tables and so on?
A: big-table is for Google, we needed our own system.

Q: limits?
A: membership, we use gossiping now.

Towards Decoupling Storage and Computation in Hadoop with SuperDataNodes
George Porter (Sun Labs)
———————
Facebook imports 25 TB/day to 1K Hadoop nodes

Key of success: coupling compute and storage
- benefits of moving computation to data
- scheduling, locality, …

When to decide to go for coupling?
- your particular data center might not be designed to be efficient for coupling
- mixture may change over time: non-uniform access to data

Goal: late binding between storage and computation
- explore the alternatives

Approach:
- stateless worker tier
- …

Advantages:
- decouple amount of storage from number of worker nodes
- more intra-rack BW than inter-rack BW
- support for “archival” data
- increased uniformity for job scheduling and block placement
- replication only for node failures
- ease of management: because of stateless worker class

Cons:
- scarce storage BW between worker and SDN
- effect on fault-tolerance
- cost
- performance depending on the workload