User loginTag Cloud |
Google Wave - Operational Transform and Server AcknowledgmentsGoogle Wave is an exciting new product from Google that aims to become a communications hub, replacing many current communication types, including email, instant messaging, blogging and wiki. By encompassing the features of all of these systems, a single unified system can be used for all of these communication styles. If you have not heard of Google Wave then the presentations given at Google IO 2009 are a great place to start to understand what it is all about: Key to the success of Google Wave is the publishing of the underlying protocol. This allows third-party implementations to be developed, avoiding the vendor-lock-in of proprietary protocols. The Wave protocol has been published in draft form and is being refined by Google and the wider community. From a technical point of view, the Wave protocol takes a bold step. The protocol requires clients and servers to perform Operational Transformations on the operations that users perform on the Wave documents. Operational Transformation has been actively researched for a few decades now, however, very few commercial products have adopted this approach to concurrency control for a number of reasons, such as the complexity of the algorithms and the poor performance of correct control and transformation algorithms. For a good primer on how OT is used in Wave, see the video "Google Wave: Live collaborative editing" by David Wang. You can also read about the Wave OT implementation in the published white paper on the Wave Protocol web site. I am interested the modification Google made to the Jupiter Collaboration System in order to reduce the server implementation complexity. The modification of interest is the addition of server acknowledgements (the ACK) to the protocol. The ACK is used to control when the client can send composite operations to the server. By only allowing a single operation to be sent to the server between ACKs, the client is able to perform a lot of the OT work on behalf of the server. Also, the server is able to store less per-client state. In particular, it only needs to store a single state space, where the state space is an ordered history of all operations (referred to as a linear history buffer in some of the literature). While the inclusion of the ACK simplifies the server implementation and reduces the server memory footprint, there is one significant trade off. Since clients can not stream operations to the server as they happen, latency between clients is increased. This hurts interactivity somewhat as a client will see, at best, batched-up operations performed by another client arrive in intervals proportional to the round-trip time between the clients. Is this a significant limitation of the protocol? I guess time will tell. Understanding the ACK is not a trivial exercise, especially for those who are new to OT and I feel that the OT white paper from Google is not detailed enough for protocol implementers. I started a thread on the Wave Protocol Newsgroup to discuss the ACK and got some good responses that aided my understanding greatly. The rest of this article tries to clarify how the Wave server and client interact using OT. The state space is an ordered list of operations, where the ordering is basically the temporal (and hence, causal) order in which the operations happened. When operations are performed concurrently by clients, there is not temporal ordering of the operations, so the server makes an arbitrary decision as to which order the operations will be added to the history buffer. However, the concurrent operations may affect each other, so the server must perform some OT in order to maintain a linear list of operations, as described below. Let say that the server state space is defined as follows (where older operations are to the left in the list):
Let's say that there are three clients who have the following state spaces:
Where Oa, Ob and Oc are operations performed concurrently by the clients. Once a client is sent an ACK from the server, it can send its pending operation to the server. Hence, the server will receive three operations, all of which were performed against the same initial state. Let's say that the server decides that it wants to append the operation to its state space in the order Oa, Ob, Oc, then the following is done by the server:
Once the server has transformed a client's operation, applied it to its Wave document and appended it the state space, the transformed operation is broadcast to all clients that have the wave open (note: presumably the server does not send the transformed operation to the client that generated the original operation, though I can't find this specified in the Wave protocol spec). The server then sends an ACK to the client that generated the operation that is being processed. It is very important (though not clearly documented) that the ACK be sent before any other transformed operations are sent to the client. Since all clients receive the operations in the same order, all clients will agree on the order of remote operations. It is exactly this global ordering that allows the Wave OT algorithm to avoid the need to satisfy the TP2 constraint! Each client will incorporate the operations from other clients into their own state space. This is where things get interesting. If the clients were to merely apply operations received from the serve, then their state spaces would be as follows:
Clearly this is not correct as all clients have a different state space and on clients B and C, the operations are in the wrong order, so if they were executed as shown above, the Wave document would become corrupted. Clearly the client has work left to do when it receives operations from the server! What the client needs to do is quite simple - each incoming operation needs to be transformed to include the effect any unacknowledged local operations. Let us consider what happens on each client:
Notice that each client has ended up with a different state space (and only one of which is the same as the server's). This is what is shown in the diagram in the Google OT white paper. The client and server have taken different paths to arrive at the same state. To prove that all three clients in the above example arrive at the same state, we need to prove that all three final state spaces result in the same document state. In this proof we will make use of Transformation Property 1 (TP1) that is required by all transforms in an OT system. TP1 states that (taken from Wikipedia):
For every pair of concurrent operations O1 and O2 defined on the same state, the transformation function T satisfies TP1 property if and only if:
Given the above proofs, it is clear that all clients agree on the Wave document state, even though they applied operations in different orders. The client implementation can be somewhat simpler than what has been described above because the client does not need to maintain a complete state space. However, this is a discussion for another day. There is also some complexity in the client that is not described above. For example, the client may perform any number of local operations while waiting for an ACK from the server. Also, during this time the client may receive any number of operations from the server. A simple, efficient client implementation is possible, but it is somewhat non-trivial, especially for those who are new to OT. Again, this is a discussion for another day. |
This is the best post on Wave
This is the best post on Wave OT that I've read to date. Thanks. I found it very instructive.
Cheers
Benjamin
Thanks Daniel, this is a very
Thanks Daniel, this is a very helpful article.
Except I get a bit lost around (Client C) (Step 6). And cannot see where you obtain the expressions:
Oc' = T( T(Oc, Oa), T(Ob, Oa ) )Any chance you could help me out a bit?
Hi Andrew, I'm very happy
Hi Andrew,
I'm very happy that you've found this article helpful!
In the case that you've noted, there is some missing detail on transforming lists of operations that has made it unclear - I'll try to fill in some gaps here.
So we had the example where there were three concurrent operations (Oa, Ob and Oc) and we wish to use the Inclusion Transformation (IT) to allow the operations to be applied in a particular order (this is the function called T in my blog post). In this case we want to apply the operations in the order Oa, Ob', Oc', where Ob' and Oc' are transformed versions of Ob and Oc respectively.
In my mind, I have the phrase "transform Ox past Oy" when thinking about the inclusion transform. This extends well to transforming an operations past a list of operations; "trasform Ox past Oy, then past Oz, then past ...". Applying this to our three concurrent operations, I would think the following:
Step by step in deriving Oc' follows:
There are quite efficient algorithms for the inclusion transformation of lists of operations against lists of operations. From memory, the naive algorithms are O(n^2), but it is simple to achieve (n log(n)).
I hope that helps. Let me know if anything needs further clarification.
Regards,
Dan
Post new comment