Distributed Systems
CS3.401Subjective Questions
Short, long, derivation, numerical, proof, compare, architecture.
State Tanenbaum's definition of a distributed system and list its five required features.
State CAP and explain why CA is impossible in real distributed systems.
Two-Generals problem and its implication for distributed systems.
Define Lamport's happened-before relation and the clock consistency condition. Why is scalar Lamport clock only weakly consistent?
State the vector clock rules and how to compare two vector timestamps.
Compare scalar, vector, and matrix clocks on storage, consistency, and main use case.
Compare Cristian's and Berkeley algorithms for physical clock synchronisation.
Explain Chandy-Lamport snapshot algorithm. Why does it require FIFO channels?
State the C1 and C2 cut conditions.
Compare Chandy-Lamport, Lai-Yang, and Acharya-Badrinath.
State the BSS algorithm's two delivery conditions and explain each.
Distinguish FIFO, causal, and total order delivery.
Compare Lamport, Ricart-Agrawala, Maekawa (V2), Suzuki-Kasami, and Raymond DME algorithms.
State Lamport's L1 and L2 entry conditions and explain why both are required.
Why does Maekawa V1 deadlock? How do FAILED / INQUIRE / YIELD in V2 fix it?
Suzuki-Kasami — why does the token-send condition require RN[i] = LN[i] + 1?
Distinguish cycle vs knot in WFG. Which model needs which?
Trace Chandy-Misra-Haas probe algorithm on a 4-process cycle. State the detection rule.
Why is detection the preferred deadlock-handling strategy in DS?
State the three impossibility results for Byzantine Agreement.
Why is N = 3, f = 1 Byzantine impossible? Use the indistinguishability argument.
Compare OM(m) and Phase King on bounds, rounds, messages, and simplicity.
Describe the two phases of 2PC and state the point of no return.
When does 2PC block? How does 3PC's PRE-COMMIT phase fix this?
Why is 3PC not used in practice?
State Raft's three sub-problems and one-line each.
Why is randomised election timeout critical in Raft?
State Raft's election restriction and why it ensures safety.
State GHS combining rules A, B, C with conditions and actions.
Prove max level in GHS is log₂ N.
Test/Accept/Reject — state the three reply rules at node q.
State GFS's three-component architecture and a one-line role for each.
Walk through GFS's write flow with leases. Why are data and control flow separated?
GFS consistency states: define defined / consistent / undefined / inconsistent and give one scenario for each.
Why doesn't GFS master persistently log chunk locations?
Define a distributed system. What are its essential required features?
Why do we need distributed systems? List the primary motivations.
List the key challenges in designing a distributed system.
State and explain the CAP theorem. Give one example each for the Consistency and Availability choices.
What is consensus in a distributed system? How does the Byzantine Agreement Problem extend it?
Describe the four dimensions along which distributed-system models can vary.
Define internal, send and receive events. How does an event change system state?
Why is time synchronization important in a distributed system?
Explain Cristian's (time-server) algorithm. How does it bound the actual server time?
Describe the Berkeley UNIX time-synchronization algorithm. How does it differ from Cristian's?
Briefly describe the decentralized averaging algorithm for clock synchronization.
What is NTP and what is its claimed accuracy? Explain its hierarchical structure.
Even with NTP, physical-time synchronization is imperfect. Why might we not need physical time at all?
Define logical time. What ordering rules does it implement?
Differentiate between clock consistency and strong consistency.
Define logical concurrency. How does it differ from physical concurrency?
State the data structure and update rules of Lamport's scalar clock.
Discuss two key properties of Lamport's scalar clock: monotonicity and total ordering.
What is event counting in Lamport's scalar time? What does the height of an event mean?
Show with an example that scalar clocks are not strongly consistent.
Illustrate the 'out-of-order message arrival' problem with scalar clocks.
Define vector time. Explain its data structure and update rules.
How are two vector timestamps compared?
Prove that vector clocks are strongly consistent.
What is the chief limitation of vector clocks? What optimization technique addresses it?
What does each component of a vector timestamp tell us when d = 1?
Define matrix time. What does each entry of the matrix represent?
State the update rules for matrix clocks.
State the key additional property of matrix clocks (beyond vector clocks) and explain its practical use.
Why is recording the global state of a distributed system non-trivial?
Define a consistent global state. State the conditions C1 and C2 a global state must satisfy.
What is a consistent cut in a space--time diagram?
State the Chandy--Lamport algorithm for FIFO channels (marker sending rule and marker receiving rule).
What is the termination condition and message complexity of the Chandy--Lamport algorithm?
Why does the Chandy--Lamport algorithm not work for non-FIFO channels? Briefly explain the Lai--Yang algorithm that handles non-FIFO channels.
Describe the Acharya--Badrinath snapshot algorithm for causal channels. Why does it not need to record channel state in the same way as Chandy--Lamport?
Distinguish FIFO, non-FIFO, and causal message-delivery models.
Illustrate with a numerical example why naively recording local states at different physical times can give wrong totals.
Why is mutual exclusion harder in distributed systems than in centralized OS?
List the three correctness requirements of a distributed mutual-exclusion algorithm.
Define the four standard performance metrics for distributed mutual-exclusion algorithms.
Describe Lamport's algorithm for distributed mutual exclusion. State its three rules.
Why does Lamport's algorithm require FIFO channels? Why are REPLY messages required from every site, not just the one entering the CS?
Give the message complexity, synchronization delay, and ordering property of Lamport's algorithm.
Describe the Ricart--Agrawala algorithm. How does it improve on Lamport's algorithm?
Give a brief correctness argument and the message complexity of Ricart--Agrawala.
Explain the Roucairol--Carvalho optimization of Ricart--Agrawala.
Describe Maekawa's algorithm. What are the conditions on the request sets (quorums)?
Why is the simple version of Maekawa's algorithm deadlock-prone? Outline the additional messages used to avoid deadlock.
Describe the Suzuki--Kasami algorithm. What does the token carry, and what state does each process keep?
What are the synchronization delay and message complexity of Suzuki--Kasami?
Describe Raymond's tree-based algorithm for mutual exclusion.
Compare the five distributed mutual-exclusion algorithms covered (Lamport, Ricart--Agrawala, Maekawa, Suzuki--Kasami, Raymond) on message count and synchronization delay.
What is a Wait-For Graph (WFG)? How does it characterize deadlock?
List and describe the four resource-request models. For each, what is the necessary/sufficient condition for deadlock?
Define a knot. Why does a knot indicate deadlock in the OR model, and why is a knot alone not sufficient to identify all deadlocked processes?
List the three control organizations for distributed deadlock detection and give their main advantage and disadvantage.
List the three handling strategies for deadlocks in distributed systems and why detection is favoured in practice.
Describe the Ho--Ramamoorthy two-phase centralized deadlock-detection algorithm. Why two phases? What is the one-phase variant?
Describe the Chandy--Misra--Haas distributed edge-chasing algorithm. What is in a probe and when is deadlock declared?
Outline the Mitchell--Merritt algorithm for the single-resource model. What is the role of public and private labels?
Briefly describe Chandy et al.'s diffusion-computation algorithm for OR-model deadlock detection.
List typical applications of consensus in distributed systems.
Differentiate between crash failure and Byzantine failure.
State the three conditions of the Byzantine Agreement Problem.
State the impossibility results and lower bounds for Byzantine Agreement.
Describe the synchronous consensus algorithm for crash failures. Why does it need f + 1 rounds?
Show that Byzantine Agreement is impossible for N = 3 and f = 1 (i.e., 1 faulty out of 3 generals).
Describe the Lamport--Shostak--Pease (Oral Messages) algorithm for Byzantine Agreement.
Describe the Phase King algorithm. Why does the existence of one non-malicious king among f + 1 phases suffice?
State the ACID properties and briefly explain each in the context of distributed transactions.
Why do we need a commit protocol, and what assumption does 2PC make about failures?
Describe Phase 1 of 2PC (Obtaining a Decision) in detail.
Describe Phase 2 of 2PC (Recording the Decision).
How does 2PC handle the failure of a participating site (Sk)?
How does 2PC handle the coordinator's failure? What is the 'blocking problem'?
How does 2PC handle network partitions?
Explain the Three-Phase Commit (3PC) protocol. How does it avoid the blocking problem of 2PC?
What is consensus in Raft's context? What failure model does Raft assume?
Outline the three core sub-problems Raft decomposes consensus into.
Describe Raft's leader-election mechanism. How are split votes avoided?
What is a term in Raft and how does it help detect stale leaders?
Describe Raft's log-replication protocol (normal operation).
What safety guarantees does Raft maintain to ensure correctness despite leader crashes?
Why is Raft considered more understandable than Paxos?
State the Cut Property and the Cycle Property of MSTs. Which one does GHS exploit?
Prove the Cut Property: with distinct edge weights, the lightest edge crossing any cut belongs to the MST.
What is a fragment in GHS? State the three combining rules (A, B, C).
Describe how GHS finds the lowest-weight outgoing edge of a fragment using test/accept/reject messages.
How does GHS send the connect message to merge fragments, and how does the algorithm terminate?
Prove that the maximum level a node can have in a GHS-constructed MST of N nodes is log N.
What are the three node states in GHS? What is the role of statusp[q]?
What design assumptions about hardware and applications drive GFS?
Describe the high-level GFS architecture. What entities exist?
Walk through the steps a client takes to read a file in GFS.
Describe the GFS write protocol, including the role of leases and the primary replica.
Define and distinguish the four states of a file region after a mutation in GFS: consistent, defined, undefined, inconsistent.
What can go wrong with concurrent writes/appends in GFS? Give specific failure modes.
What is the role of leases in GFS, and what do they enforce?
How does GFS handle replica placement, re-replication, and rebalancing?
How does GFS handle garbage collection and stale replicas?
Why is a single master not a bottleneck in GFS, despite being a single point of contact?
How does GFS support snapshots? Explain copy-on-write.
Summarise the key benefits and limitations of GFS.