Vector clock

A vector clock is a data structure used for determining the partial ordering of events in a distributed system and detecting causality violations. Just as in Lamport timestamps, inter-process messages contain the state of the sending process's logical clock. A vector clock of a system of N processes is an array/vector of N logical clocks, one clock per process; a local "largest possible values" copy of the global clock-array is kept in each process.

Denote $VC_{i}$ as the vector clock maintained by process i, the clock updates proceed as follows:^[1]

Example of a system of vector clocks. Events in the blue region are the causes leading to event B4, whereas those in the red region are the effects of event B4.

Initially all clocks are zero.
Each time a process experiences an internal event, it increments its own logical clock in the vector by one. For instance, upon an event at process i, it updates $VC_{i}[i]\leftarrow VC_{i}[i]+1$ .
Each time a process sends a message, it increments its own logical clock in the vector by one (as in the bullet above, but not twice for the same event) and then the message piggybacks a copy of its own vector.
Each time a process receives a message, it increments its own logical clock in the vector by one and updates each element in its vector by taking the maximum of the value in its own vector clock and the value in the vector in the received message (for every element). For example, if process Pj receives a message m from Pi, it updates by setting $VC_{j}\leftarrow max(VC_{j}[k]+1,VC_{i}[k]),\forall k$ .

History

Without using the specific name "vector clock", the concept of a vector clock was first mentioned^[2] in a 1986 paper by Rivka Ladin and Barbara Liskov where they use the term "multipart timestamp".^[3] To quote from page 31 of the Liskov/Ladin paper:

We solve this problem by using multipart timestamps, where there is one part for each replica. Thus, if there are n replicas, a timestamp t is
t = <t1, …, tn>
where each part is a positive integer. Since there will typically be a small number of replicas (e.g., 3 to 7), using such a timestamp is practical.

The term "vector clock" was first used independently by Colin Fidge and Friedemann Mattern in 1988.^[4]^[5]

Partial ordering property

Vector clocks allow for the partial causal ordering of events. Defining the following:

$VC(x)$ denotes the vector clock of event $x$ , and $VC(x)_{z}$ denotes the component of that clock for process $z$ .
$VC(x)<VC(y)\iff \forall z[VC(x)_{z}\leq VC(y)_{z}]\land \exists z'[VC(x)_{z'}<VC(y)_{z'}]$ $VC(x)<VC(y)\iff \forall z[VC(x)_{z}\leq VC(y)_{z}]\land \exists z'[VC(x)_{z'}<VC(y)_{z'}]$
- In English: $VC(x)$ is less than $VC(y)$ , if and only if $VC(x)_{z}$ is less than or equal to $VC(y)_{z}$ for all process indices $z$ , and at least one of those relationships is strictly smaller (that is, $VC(x)_{z'}<VC(y)_{z'}$ ).
$x\to y\;$ denotes that event $x$ happened before event $y$ . It is defined as: if $x\to y\;$ , then $VC(x)<VC(y)$

Properties:

Antisymmetry: if $VC(a)<VC(b)$ , then ¬ $(VC(b)<VC(a))$
Transitivity: if $VC(a)<VC(b)$ and $VC(b)<VC(c)$ , then $VC(a)<VC(c)$ ; or, if $a\to b\;$ and $b\to c\;$ , then $a\to c\;$

Relation with other orders:

Let $RT(x)$ be the real time when event $x$ occurs. If $VC(a)<VC(b)$ , then $RT(a)<RT(b)$
Let $C(x)$ be the Lamport timestamp of event $x$ . If $VC(a)<VC(b)$ , then $C(a)<C(b)$

Other mechanisms

In 1999, Torres-Rojas and Ahamad developed Plausible Clocks,^[6] a mechanism that takes less space than vector clocks but that, in some cases, will totally order events that are causally concurrent.

In 2005, Agargwal and Garg created Chain Clocks,^[7] a system that tracks dependencies using vectors with size smaller than the number of processes and that adapts automatically to systems with dynamic number of processes.
In 2008, Almeida et al. introduced Interval Tree Clocks.^[8]^[9]^[10] This mechanism generalizes Vector Clocks and allows operation in dynamic environments when the identities and number of processes in the computation is not known in advance.

In 2019, Lum Ramabaja developed Bloom Clocks,^[11] a probabilistic data structure whose space complexity does not depend on the number of nodes in a system. If two clocks are not comparable, the bloom clock can always deduce it, i.e. false negatives are not possible. If two clocks are comparable, the bloom clock can calculate the confidence of that statement, i.e. it can compute the false positive rate between comparable pairs of clocks.

References

^ "Distributed Systems 3rd edition (2017)". DISTRIBUTED-SYSTEMS.NET. Retrieved 2021-03-21.
^ The reference to this paper was discovered by Prof Lindsey Kuper and described in lecture 23 of her YouTube video lecture series on Distributed Systems
^ Barbara Liskov, Rivka Ladin (1986). "Highly-Available Distributed Services and Fault-Tolerant Distributed Garbage Collection". 5th Symposium on the Principles of Distributed Computing. ACM. pp. 29–39. CiteSeerX 10.1.1.569.3601. Retrieved 2020-09-22.
^ Colin J. Fidge (February 1988). "Timestamps in Message-Passing Systems That Preserve the Partial Ordering" (PDF). In K. Raymond (ed.). Proc. of the 11th Australian Computer Science Conference (ACSC'88). pp. 56–66. Retrieved 2009-02-13.
^ Mattern, F. (October 1988), "Virtual Time and Global States of Distributed Systems", in Cosnard, M. (ed.), Proc. Workshop on Parallel and Distributed Algorithms, Chateau de Bonas, France: Elsevier, pp. 215–226
^ Francisco Torres-Rojas; Mustaque Ahamad (1999), "Plausible clocks: constant size logical clocks for distributed systems", Distributed Computing, 12 (4): 179–195, doi:10.1007/s004460050065, S2CID 2936350
^ Agarwal, Anurag; Garg, Vijay K. (17 July 2005). "Efficient dependency tracking for relevant events in shared-memory systems" (PDF). Proceedings of the Twenty-Fourth Annual ACM Symposium on Principles of Distributed Computing. Association for Computing Machinery: 19–28. doi:10.1145/1073814.1073818. ISBN 1-58113-994-2. S2CID 11779779. Retrieved 21 April 2021.
^ Almeida, Paulo; Baquero, Carlos; Fonte, Victor (2008), "Interval Tree Clocks: A Logical Clock for Dynamic Systems", in Baker, Theodore P.; Bui, Alain; Tixeuil, Sébastien (eds.), Principles of Distributed Systems (PDF), Lecture Notes in Computer Science, vol. 5401, Springer-Verlag, Lecture Notes in Computer Science, pp. 259–274, Bibcode:2008LNCS.5401.....B, doi:10.1007/978-3-540-92221-6, ISBN 978-3-540-92220-9
^ Almeida, Paulo; Baquero, Carlos; Fonte, Victor (2008), "Interval Tree Clocks: A Logical Clock for Dynamic Systems", Interval Tree Clocks: A Logical Clock for Dynamic Systems, Lecture Notes in Computer Science, vol. 5401, p. 259, doi:10.1007/978-3-540-92221-6_18, hdl:1822/37748, ISBN 978-3-540-92220-9
^ Zhang, Yi (2014), "Background Preliminaries: Interval Tree Clock Results", Background Preliminaries: Interval Tree Clock Results (PDF)
^ Lum Ramabaja (2019), The Bloom Clock, arXiv:1905.13064, Bibcode:2019arXiv190513064R

External links

[1] "Distributed Systems 3rd edition (2017)". DISTRIBUTED-SYSTEMS.NET. Retrieved 2021-03-21.

[2] The reference to this paper was discovered by Prof Lindsey Kuper and described in lecture 23 of her YouTube video lecture series on Distributed Systems

[3] Barbara Liskov, Rivka Ladin (1986). "Highly-Available Distributed Services and Fault-Tolerant Distributed Garbage Collection". 5th Symposium on the Principles of Distributed Computing. ACM. pp. 29–39. CiteSeerX 10.1.1.569.3601. Retrieved 2020-09-22.

[4] Colin J. Fidge (February 1988). "Timestamps in Message-Passing Systems That Preserve the Partial Ordering" (PDF). In K. Raymond (ed.). Proc. of the 11th Australian Computer Science Conference (ACSC'88). pp. 56–66. Retrieved 2009-02-13.

[5] Mattern, F. (October 1988), "Virtual Time and Global States of Distributed Systems", in Cosnard, M. (ed.), Proc. Workshop on Parallel and Distributed Algorithms, Chateau de Bonas, France: Elsevier, pp. 215–226

[6] Francisco Torres-Rojas; Mustaque Ahamad (1999), "Plausible clocks: constant size logical clocks for distributed systems", Distributed Computing, 12 (4): 179–195, doi:10.1007/s004460050065, S2CID 2936350

[7] Agarwal, Anurag; Garg, Vijay K. (17 July 2005). "Efficient dependency tracking for relevant events in shared-memory systems" (PDF). Proceedings of the Twenty-Fourth Annual ACM Symposium on Principles of Distributed Computing. Association for Computing Machinery: 19–28. doi:10.1145/1073814.1073818. ISBN 1-58113-994-2. S2CID 11779779. Retrieved 21 April 2021.

[8] Almeida, Paulo; Baquero, Carlos; Fonte, Victor (2008), "Interval Tree Clocks: A Logical Clock for Dynamic Systems", in Baker, Theodore P.; Bui, Alain; Tixeuil, Sébastien (eds.), Principles of Distributed Systems (PDF), Lecture Notes in Computer Science, vol. 5401, Springer-Verlag, Lecture Notes in Computer Science, pp. 259–274, Bibcode:2008LNCS.5401.....B, doi:10.1007/978-3-540-92221-6, ISBN 978-3-540-92220-9

[9] Almeida, Paulo; Baquero, Carlos; Fonte, Victor (2008), "Interval Tree Clocks: A Logical Clock for Dynamic Systems", Interval Tree Clocks: A Logical Clock for Dynamic Systems, Lecture Notes in Computer Science, vol. 5401, p. 259, doi:10.1007/978-3-540-92221-6_18, hdl:1822/37748, ISBN 978-3-540-92220-9

[10] Zhang, Yi (2014), "Background Preliminaries: Interval Tree Clock Results", Background Preliminaries: Interval Tree Clock Results (PDF)

[11] Lum Ramabaja (2019), The Bloom Clock, arXiv:1905.13064, Bibcode:2019arXiv190513064R

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

Vector clock

Contents

History

Partial ordering property

Other mechanisms

See also

References

External links

Navigation menu