EuroSys
The European Professional Society on Computer Systems

European chapter of the
Special Interest Group on Operating Systems (SIGOPS)
of the Association for Computing Machinery (ACM)
Home Join or renew membership EuroSys for Students EuroSys for Faculty Job Offers Activities Systems Directory Systems Events and Blog Eastern Europe Initiative Member Area Member News Officers and Volunteers Useful links Press releases

SOSP09: Session 1: Scalability

FAWN: A Fast Array of Wimpy Nodes [PDF]
David G. Andersen (Carnegie Mellon University), Jason Franklin (Carnegie Mellon University), Michael Kaminsky (Intel Research Pittsburgh), Amar Phanishayee (Carnegie Mellon University), Lawrence Tan (Carnegie Mellon University), Vijay Vasudevan (Carnegie Mellon University)
—————————
In the interest of power, lets use an array of smaller machines which use less power
- The peak power in these machines is less

The CPUs has become much faster than the hard disks. Thus, we do not need fast big machines for IO-intensive tasks; small boxes would perform the same.

FAWN uses key-value schema to be easily parallelizable on several machines.

Challenge: flash disks do not perform well in small writes
- solution: use only append

RouteBricks: Exploiting Parallelism to Scale Software Routers [PDF]
Mihai Dobrescu (EPFL) and Norbert Egi (Lancaster University/Intel Research), Katerina Argyraki (EPFL), Byung-Gon Chun (Intel Research), Kevin Fall (Intel Research), Gianluca Iannaccone (Intel Research), Allan Knies (Intel Research), Maziar Manesh (Intel Research), Sylvia Ratnasamy (Intel Research)
—————————
Routers are either fast or programmable; Fast routers are hardware routers.

RouteBricks: get performance out of off-the-shelf PCs (software routers)

Each server represents a (few) line card with rate R.

Problem: Each server needs N.R processing power for switching (N is the number of interfaces)
Solution: Valiant load balancing: use intermediate servers, each server divides the output traffic between N intermediate server and then the intermediate servers aggregate the traffic towards the external interfaces.
=> Per-server processing rate = 3.R

Improve per server performance:
- write a device driver to batch several packets before sending them to cores

Uses one core per queue (having multi-queue interfaces)

The prototype has 4 servers.
- It introduces reordering: 0.15%
- Latency is *estimated* 24 microsecond per server

Q: programmability needs to keep states. Having your architecture distributed, how programmable it would be?
A: it is an issue

Q: power?
A: we spend more power

Q: if you actually program your router, the performance drops, no?
A: …

The Multikernel: A New OS Architecture for Scalable Multicore Systems [PDF]
Andrew Baumann (ETH Zurich), Paul Barham (MSR Cambridge), Pierre-Evariste Dagand (ENS Cachan Bretagne), Tim Harris (MSR Cambridge), Rebecca Isaacs (MSR Cambridge), Simon Peter (ETH Zurich), Timothy Roscoe (ETH Zurich), Adrian Schüpbach (ETH Zurich), Akhilesh Singhania (ETH Zurich)
—————————
System diversity: the cache model and interconnect is different from architecture to architecture
- can not optimise at design time

proposal: OS as distributed systems
- replicated state
- make the inter-core communication explicit

all communication with messages -> no shared data

use sharing just as a local optimization of message passing

performance is comparable to existing systems

Tags:

Comments are closed.