Google Wave: Operational Transform and Server Acknowledgements

Google Wave is an exciting new product from Google that aims to become a communications hub, replacing many current communication types, including email, instant messaging, blogging and wiki. By encompassing the features of all of these systems, a single unified system can be used for all of these communication styles.

If you have not heard of Google Wave then the presentations given at Google IO 2009 are a great place to start to understand what it is all about:

Key to the success of Google Wave is the publishing of the underlying protocol. This allows third-party implementations to be developed, avoiding the vendor-lock-in of proprietary protocols. The Wave protocol has been published in draft form and is being refined by Google and the wider community.

From a technical point of view, the Wave protocol takes a bold step. The protocol requires clients and servers to perform Operational Transformations on the operations that users perform on the Wave documents. Operational Transformation has been actively researched for a few decades now, however, very few commercial products have adopted this approach to concurrency control for a number of reasons, such as the complexity of the algorithms and the poor performance of correct control and transformation algorithms.

For a good primer on how OT is used in Wave, see the video
"Google Wave: Live collaborative editing" by David Wang. You can also read about the Wave OT implementation in the published white paper on the Wave Protocol web site.

I am interested the modification Google made to the Jupiter Collaboration System in order to reduce the server implementation complexity. The modification of interest is the addition of server acknowledgements (the ACK) to the protocol. The ACK is used to control when the client can send composite operations to the server. By only allowing a single operation to be sent to the server between ACKs, the client is able to perform a lot of the OT work on behalf of the server. Also, the server is able to store less per-client state. In particular, it only needs to store a single state space, where the state space is an ordered history of all operations (referred to as a linear history buffer in some of the literature).

While the inclusion of the ACK simplifies the server implementation and reduces the server memory footprint, there is one significant trade off. Since clients can not stream operations to the server as they happen, latency between clients is increased. This hurts interactivity somewhat as a client will see, at best, batched-up operations performed by another client arrive in intervals proportional to the round-trip time between the clients. Is this a significant limitation of the protocol? I guess time will tell.

Understanding the ACK is not a trivial exercise, especially for those who are new to OT and I feel that the OT white paper from Google is not detailed enough for protocol implementers. I started a thread on the Wave Protocol Newsgroup to discuss the ACK and got some good responses that aided my understanding greatly. The rest of this article tries to clarify how the Wave server and client interact using OT.

The state space is an ordered list of operations, where the ordering is basically the temporal (and hence, causal) order in which the operations happened. When operations are performed concurrently by clients, there is not temporal ordering of the operations, so the server makes an arbitrary decision as to which order the operations will be added to the history buffer. However, the concurrent operations may affect each other, so the server must perform some OT in order to maintain a linear list of operations, as described below.

Let say that the server state space is defined as follows (where older operations are to the left in the list):

  1. Server state space: [O1, O2, O3]

Let's say that there are three clients who have the following state spaces:

  1. Client A State Space: [O1, O2, O3, Oa]
  2. Client B State Space: [O1, O2, O3, Ob]
  3. Client C State Space: [O1, O2, O3, Oc]

Where Oa, Ob and Oc are operations performed concurrently by the clients. Once a client is sent an ACK from the server, it can send its pending operation to the server. Hence, the server will receive three operations, all of which were performed against the same initial state. Let's say that the server decides that it wants to append the operation to its state space in the order Oa, Ob, Oc, then the following is done by the server:

  1. Server state space: [O1, O2, O3, Oa, Ob', Oc']
  2.  
  3. Where:
  4. Ob' = T( Ob, Oa ) (meaning: transform Ob past Oa)
  5. Oc' = T( Oc, [Oa, Ob'] ) (meaning: transform Oc past the list [Oa, Ob'] )
  6. = T( T(Oc, Oa), T(Ob, Oa ) )
  7.  
  8.  
  9. T( x, y ) is some Transform that transforms x such that it includes the effect of y.

Once the server has transformed a client's operation, applied it to its Wave document and appended it the state space, the transformed operation is broadcast to all clients that have the wave open (note: presumably the server does not send the transformed operation to the client that generated the original operation, though I can't find this specified in the Wave protocol spec). The server then sends an ACK to the client that generated the operation that is being processed. It is very important (though not clearly documented) that the ACK be sent before any other transformed operations are sent to the client.

Since all clients receive the operations in the same order, all clients will agree on the order of remote operations. It is exactly this global ordering that allows the Wave OT algorithm to avoid the need to satisfy the TP2 constraint!

Each client will incorporate the operations from other clients into their own state space. This is where things get interesting. If the clients were to merely apply operations received from the serve, then their state spaces would be as follows:

  1. Client A State Space: [O1, O2, O3, Oa, Ob', Oc']
  2. Client B State Space: [O1, O2, O3, Ob, Oa, Oc']
  3. Client C State Space: [O1, O2, O3, Oc, Oa, Ob']

Clearly this is not correct as all clients have a different state space and on clients B and C, the operations are in the wrong order, so if they were executed as shown above, the Wave document would become corrupted. Clearly the client has work left to do when it receives operations from the server! What the client needs to do is quite simple - each incoming operation needs to be transformed to include the effect any unacknowledged local operations. Let us consider what happens on each client:

Client A:

  1. Client A state space: [O1, O2, O3, Oa]
  2. Client A sends [Oa] to the server
  3. Client A receives an ACK from the server (this acknowledges Oa)
  4. Client A receives [Ob'] from the server
  5. Since there are no unacknowledged local operations, Ob' is applied to the local wave and appended to the state space
  6. Client A receives [Oc'] from the server
  7. Since there are no unacknowledged local operations, Oc' is applied to the local wave and appended to the state space

The final state space is [O1, O2, O3, Oa, Ob', Oc']

Client B:

  1. Client B state space: [O1, O2, O3, Ob]
  2. Client B sends [Ob] to the server
  3. Client B receives [Oa] from the server
  4. Client B transforms Oa against it's unacknowledged local operation Ob to form Oa' which is then applied to the local wave and appended to the state space
  5. Client B receives an ACK from the server (this acknowledges Ob)
  6. Client B receives [Oc'] from the server
  7. Since there are no unacknowledged local operations, Oc' is applied to the local wave and appended to the state space

The final state space is [O1, O2, O3, Ob, Oa', Oc']

Client C:

  1. Client C state space: [O1, O2, O3, Oc]
  2. Client C sends [Oc] to the server
  3. Client C receives [Oa] from the server
  4. Client C transforms Oa against it's unacknowledged local operation Oc to form Oa'' which is then applied to the local wave and appended to the state space
  5. Client C receives [Ob'] from the server
  6. Client C needs to transforms Ob' against it's unacknowledged local operation Oc. Since Ob' already includes the effect of Oa, we need to transform Oc against Oa (call the result Oc'') and then transform Ob' against Oc'' to form Ob''. Ob'' is then applied to the local wave and appended to the state space.
  7. Client C receives an ACK from the server (this acknowledges Oc)

The final state space is [O1, O2, O3, Oc, Oa'', Ob'']

Notice that each client has ended up with a different state space (and only one of which is the same as the server's). This is what is shown in the diagram in the Google OT white paper. The client and server have taken different paths to arrive at the same state.

To prove that all three clients in the above example arrive at the same state, we need to prove that all three final state spaces result in the same document state. In this proof we will make use of Transformation Property 1 (TP1) that is required by all transforms in an OT system. TP1 states that (taken from Wikipedia):

For every pair of concurrent operations O1 and O2 defined on the same state, the transformation function T satisfies TP1 property if and only if:

[O1, T(O2, O1)] is equivalent to [O2, T(O1, O2)]

where [Oi, Oj] denotes the sequence of operations containing Oi followed by Oj.

  1. Proof that [O1, O2, O3, Ob, Oa', Oc'] is equivalent to [O1, O2, O3, Oa, Ob', Oc']
  2.  
  3. where:
  4. Oa' = T( Oa, Ob )
  5. Ob' = T( Ob, Oa )
  6.  
  7. [O1, O2, O3, Ob, Oa', Oc'] ~ [O1, O2, O3, Ob, T( Oa, Ob ), Oc' ]
  8. ~ [O1, O2, O3, Oa, T( Ob, Oa ), Oc' ]
  9. ~ [O1, O2, O3, Oa, Ob', Oc' ]
  1. Proof that [O1, O2, O3, Oc, Oa'', Ob''] is equivalent to [O1, O2, O3, Oa, Ob', Oc']
  2.  
  3. Where:
  4. Ob' = T( Ob, Oa )
  5. Oc' = T( T(Oc, Oa), T(Ob, Oa ) )
  6. Oa'' = T( Oa, Oc )
  7. Ob'' = T( Ob', T( Oc, Oa ) )
  8.  
  9. [O1, O2, O3, Oc, Oa'', Ob''] ~ [O1, O2, O3, Oc, T( Oa, Oc ), Ob'']
  10. ~ [O1, O2, O3, Oa, T( Oc, Oa ), T( Ob', T( Oc, Oa ) )]
  11. ~ [O1, O2, O3, Oa, Ob', T( T( Oc, Oa ), Ob' ) ]
  12. ~ [O1, O2, O3, Oa, Ob', T( T( Oc, Oa ), T( Ob, Oa ) ) ]
  13. ~ [O1, O2, O3, Oa, Ob', Oc' ]

Given the above proofs, it is clear that all clients agree on the Wave document state, even though they applied operations in different orders.

The client implementation can be somewhat simpler than what has been described above because the client does not need to maintain a complete state space. However, this is a discussion for another day.

There is also some complexity in the client that is not described above. For example, the client may perform any number of local operations while waiting for an ACK from the server. Also, during this time the client may receive any number of operations from the server. A simple, efficient client implementation is possible, but it is somewhat non-trivial, especially for those who are new to OT. Again, this is a discussion for another day.