EuroSys
The European Professional Society on Computer Systems

European chapter of the
Special Interest Group on Operating Systems (SIGOPS)
of the Association for Computing Machinery (ACM)
Home Join or renew membership EuroSys for Students EuroSys for Faculty Job Offers Activities Systems Directory Systems Events and Blog Eastern Europe Initiative Member Area Member News Officers and Volunteers Useful links Press releases

SOSP09: LADIS: Session 4: Monitoring and Repair

October 11th, 2009 by Maysam Yabandeh

Toward automatic policy refinement in repair services for large distributed systems
Moises Goldszmidt (Microsoft Research), Mihai Budiu (Microsoft Research), Yue zhang (Microsoft) and Michael Pechuk (Microsoft)
———————–
Evaluate the effectiveness of a repair action and give alternatives

Effectiveness -> time that a machine is ‘usable’

What is a successful repair: measure the number of good signals
- Use machine learning to pick the right signals

A case for the accountable cloud
Andreas HAeberlen (MPI-SWS)
———————–
Problem of cloud computing: the admistrative domain is splited between the software provider and the cloud provider
- When a problem arises, how the software provider can say it was its software problem or the cloud’s fault
– How can prove the other party, if they believe the problem is from them

Solution: use peer-review techniques

Completeness: can be relaxed -> probabilistic log of action
Accuracy: can not be relaxed -> the missed action might be critical to detect the problem

Learning from the Past for Resolving Dilemmas of Asynchrony
Paul Ezhilchelvan (Newcastle University) and Santosh Shrivastava (Newcastle University)
———————–
Cost of Asynchrony: We do not have perfect failure detector

In new emerged managed environments (e.g., data centers), we do not need asynchronous model
- delays are predictable
- probabilistically synchronous model

Assumption: mostly, the performance in the recent past indicates performance in the near future

design steps:
- measure delays
- design the protocol with tunable parameters
- choose the parameters at run-time

big picture:
- assume a probability
- detection of failure is guaranteed
- react to failure in application-specific way
- adapt the probability

Q: we already have probabilistic consensus protocols.
A: they are more expensive, because they mean that repeat the process till we have consensus


SOSP09: LADIS: Session 3: Storage

October 11th, 2009 by Maysam Yabandeh

CRDTs: Consistency without concurrency control
Mihai Le?ia (ENS Lyon & LIP6), Nuno Preguiça (Univ. Nova de Lisboa) and Marc Shapiro (INRIA & LIP6)
—————————
Problem: different order of messages in different replicas
unless the operations are commutative

Use tree to represent the order of operations

Garbage collection: we need to rebalance the tree once in a while
- Rebalance changes the order and hence the identities of the items
– Requires consensus

Consensus requires small, stable membership
- what about large communities?

Solutions:
- Core: group membership, small, stable, rebalancing via 2-phase commit
- Nebula: communicate with sites in same epoch only, join core epoch
– catchup to rebalance,
– Move the extra updates aside, do the received rebalance from the core, and then send the extra updates back to the core

Q: how is different from operational transform?
A: They are not based on a decent theory

Q: Time to converge?
A: in order of ms, it supports eventual consistency

Provenance as First Class Cloud Data
Kiran-Kumar Muniswamy-Reddy (Harvard) and Margo Seltzer (Harvard)
———————
Provenance: meta-data of object history
- used in scientific reproducibility, business compliance, and security

Why support it in Cloud?
- uses want it
- …

Application: Cloud Search is becoming more important as more data in put in the cloud
- unlike web search, the are no links between objects in cloud
- provenance dependencies to refine search

Another application: pre-fetching
- use provenance DAG to identify related object and pre-fetch them

Application: Access control
- …

Application: detect application anomalies
- provide a model of normal usage/behavior
- provider cloud alert users to overuse

Requirements:
-  consistency
- long-term persistence
- queryable
- security
- coordinate storage and computing facilities

Q: how is it useful for pre-fetching
A: work on particular scenarios

Cassandra - A Decentralized Structured Storage System
Avinash Lakshman (Facebook) and Prashant Malik (Facebook)
———————
lots of data with mostly random read and write

Design goals:
- high availability
- eventual consistency
- incremental scalability
- optimistic replication
- “Knobs” to tune tradeoffs between consistency, durability, ad latency
- minimal administration

Data Model:
- Keys, columns, and super-columns
- Columns are added and modified dynamically

Write properties;
- no locks in the critical path
- sequential disk access
- behaves like a write back cache
- append support without read ahead
- atomicity guarantee for a key per replica

Lessons:
- Add fancy features only when it is necessary
- Many types of failures are possible
- Big systems need proper system-level monitoring

Q: how to compare your approach with big-tables and so on?
A: big-table is for Google, we needed our own system.

Q: limits?
A: membership, we use gossiping now.

Towards Decoupling Storage and Computation in Hadoop with SuperDataNodes
George Porter (Sun Labs)
———————
Facebook imports 25 TB/day to 1K Hadoop nodes

Key of success: coupling compute and storage
- benefits of moving computation to data
- scheduling, locality, …

When to decide to go for coupling?
- your particular data center might not be designed to be efficient for coupling
- mixture may change over time: non-uniform access to data

Goal: late binding between storage and computation
- explore the alternatives

Approach:
- stateless worker tier
- …

Advantages:
- decouple amount of storage from number of worker nodes
- more intra-rack BW than inter-rack BW
- support for “archival” data
- increased uniformity for job scheduling and block placement
- replication only for node failures
- ease of management: because of stateless worker class

Cons:
- scarce storage BW between worker and SDN
- effect on fault-tolerance
- cost
- performance depending on the workload


Thursday 2009-08-20: SIGCOMM CONFERENCE: Closing Remarks

August 20th, 2009 by Maysam Yabandeh

520+ attendees

Best Demo: OpenFlow

Best Paper: OpenRoad

Sigcomm 2011 in north america, still waiting for proposals

Pcitures on Flickr

All papers on CCR-online


Thursday 2009-08-20: SIGCOMM CONFERENCE: Performance Optimization (Chair: Ratul Mahajan, Microsoft Research)

August 20th, 2009 by Maysam Yabandeh

Session 9: Performance Optimization (Chair: Ratul Mahajan, Microsoft Research)
———————————————————-
Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication
Vijay Vasudevan (Carnegie Mellon University), Amar Phanishayee (Carnegie Mellon University), Hiral Shah (Carnegie Mellon University), Elie Krevat (Carnegie Mellon University), David Andersen (Carnegie Mellon University), Greg Ganger (Carnegie Mellon University), Garth Gibson (Carnegie Mellon University and Panasas, Inc), Brian Mueller (Panasas, Inc.)
———————————————————-
TCP has a problem in data centers: the dropped packet takes 200ms to be retransmitted

There are some apps that can not tolerate that

solution: enable ms retransmission
improve throughout/latency in datacenter
safe for wide-area

10-100 microsecond, 1-10Gbps
under heavy load, pkt loss is common

1 TCP timout is 1000s times more than RTT

The scenario involves the client sending a single request packet once in a while. This is in contrary of TCP design principles: full window of packets. Hence, the fast-retransmission does not get triggered in case of pkt loss

Solution:
1) eliminate long 200ms timeout
2) TCP must track RTT in microseconds

Interaction with delayed ACK
- The reduction is not so much
Stability? Causing congestion collapse?
- Today’s TCP has mechanisms to cope with that

Q: problem for congestion control?
A: exponential backup takes care of that


Thursday 2009-08-20: SIGCOMM CONFERENCE: Session 8: Network Measurement (Chair: Gianluca Iannaccone, Intel Labs Berkeley)

August 20th, 2009 by Maysam Yabandeh

14:00-15:30     Session 8: Network Measurement (Chair: Gianluca Iannaccone, Intel Labs Berkeley)
———————————————————-
Spatio-Temporal Compressive Sensing and Internet Traffic Matrices
Yin Zhang (University of Texas at Austin), Matthew Roughan (University of Adelaide), Walter Willinger (AT&T Labs — Research), Lili Qiu (University of Texas at Austin)
———————————————————-

How to fill the missing values in matrix?

The need for missing value interpolation

Traffic volume: a matrix where the rows represent snapshots taken in different times.
There are some missing values, how to interpolate them?

Problem: A(x)=B
Challenge: massively under-constrained

Ideas:
- TMs are low-rank
- exploit spatio-temporal properties
- exploit local structures in TMs

Passive Aggressive Measurement with MGRP
Pavlos Papageorgiou (University of Maryland, College Park), Justin McCann (University of Maryland, College Park), Michael Hicks (University of Maryland, College Park)
———————————————————-
video conference:
- monitoring the quality
-  active probing is expensive
-  we can shape app data for measurement

MGRP: piggyback app data inside active probes

Sender side: after TCP layer, fragmentation & reassembly

For evaluation: PathLoad for measuring the available BW

Q: why not piggyback the probe traffic over app traffic?
A: interesting idea

Q; Is it applicable for data centers?
A: no, not to 1Gbps

Q: MTU discovery?
A: We did not do that.

Q: The final header is UDP or TCP?
A: UDP