Category: Patterns

Akka work-pulling pattern (and some background)

10/18/2017

Akka actors can get overwhelmed, poor things. Their mailbox can only hold so many messages (either because of preconfigured bounds or memory constraints), and unpleasant things can happen when it overflows. Furthermore, uncontrolled growth of mailboxes can have bring up a whole different set of issues. Although this post focuses on the work-pulling pattern, I am going to start with an overview of Akka actors and associated constructs. If you do not want to sit through this discussion, skip down to the section titled Work-Pulling Pattern.

So let us review a few basics. Consider an actor system of two actors, as shown in this diagram, where Actor A has some data it wants to tell Actor B.

The interesting concepts here are the Actor System, the dispatchers, and the actors with their mailboxes. Let's review each one:

Actor Systems
A useful system contains at least two actors; arthouse performance artists specializing in monologues need not apply. These actors run inside a bounded construct called an Actor System. This construct provides (among other things) a default dispatcher and a default mailbox type to its actors. In some applications, actor systems can be used as failure zones as seen in the bulkhead pattern.

Actors
An actor is a "fundamental unit of computation that embodies processing, storage, and communication" (per Carl Hewitt, 1973). This means that we are often not concerned with how actors are implemented, we treat them almost as primitives. Instead, we turn our attention to defining their behavior and designing systems of actors.

Each actor is composed of a mailbox which contains incoming messages, and the actor business logic which encapsulates the user-defined behavior and state. A dispatcher is responsible for moving messages into the actor's queue, and also dequeuing messages for consumption by the aforementioned business logic. Fundamentally important, this dispatcher ensures that the only one message is dequeued and enters processing at any given time.

Dispatchers
Dispatchers are rule engines that control enqueing and dequeing messages to and from an actors’s mailbox. Each actor can have its own dispatcher, but unless specifically called for, it inherits the dispatcher of the current Actor System. I struggle if to draw the dispatcher as part of the actor, or an outside rule acting on the actor.

There are three types of dispatchers currently defined. In general, what you are looking for is the proper threading strategy for handling your actor's mailbox.

Dispatcher – the default dispatcher. This is an event-based dispatcher, where multiple actors share a thread pool. The way it usually works is that a thread is pulled from the pool, a preconfigured number of messages are drained from the mailbox, and the thread is returned back to the pool.
PinnedDispatcher – “pins” the actor to a single thread. The actor will have a thread pool of a single thread processing its mailbox (guaranteed). This can prevent thread starvation, however can cause memory starvation due to the cost of starting new threads in the system. This dispatcher should be used with great care and understanding. One use case would be to ensure that some high priority process has some (although not perfect) guarantee of some CPU availability.
BalancingDispatcher - this dispatcher allows multiple actors to share a single mailbox. Actors will drain messages from the mailbox as they become available. This is a good way to distribute load across multiple actors, with the caveat that this usually makes sense if all actors sharing the same mailbox are of the same type.
CallingThreadDispatcher - this dispatcher doesn't create new threads, but runs on the calling thread. If the actor is already running in a different thread, this thread will block until the different thread has completed. Personally I've seen this dispatcher used in unit testing, and not much else.

A dispatcher can be chosen at creation time via the withDispatcher method on the Props object, by referencing its configuration:

For more information, see the relevant page in the Akka documentation, which discusses additional tidbits, including ways to define the dispatcher for your actors through configuration.

MailboxesActors process messages serially, by definition. This means some sort of a data structure must exist to hold the incoming messages while they await processing. This is the actor’s mailbox (normally one per actor, unless the BalancingDispatcher is used).

The currently defined mailboxes are:

UnboundedMailbox – this is the default mailbox used by Akka actors.
BoundedMailbox - similar to the default mailbox, but bounded to a predefined size. This mailbox will block when full.
NonBlockingBoundedMailbox - similar to a predefined mailbox, but bounded to a predefined size. This mailbox will discard messages when full. This is one of the most efficient bounded mailboxes for general purpose use.
SingleConsumerOnlyUnboundedMailbox – this mailbox allows a single consumer (so cannot be used with BalancingPool), and multiple producers.
UnboundedPriorityMailbox - this is an unbounded mailbox, where each message has a "weight" dictating the preferential processing order. "Heavier" message will be given priority. This message does not guarantee the order of messages of the same weight, for that see UnboundedStablePriorityMailbox.
BoundedPriorityMailbox - this is similar to UnboundedPriorityMailbox, but, obviously, bounded to a predefined size. This mailbox will block when full.
UnboundedStablePriorityMailbox - this is similar to UnboundedPriorityMailbox, but maintains the order of messages of equal "weight".
UnboundedControlAwareMailbox - this is a mailbox with two queues, one for normal messages, and one for high priority messages.

As a generalization, keep in mind that those "bounded" mailboxes above block when the queue is full, unless marked as "non-blocking", in which case they will discard messages when full. Furthermore, the "unbounded" mailboxes are not really unbounded; there are still practical bounds introduced by the JVM, by memory quotas, or by the physical memory of the machine.

To choose a mailbox for your actor, you should extend your actor and implement RequiresMessageQueue[T] with T mapping to a configured type. For more information and code, see Mailboxes in the Akka documentation.

Work-Pulling Pattern
An indicator of a good system design is predictability. Although blocking is generally undesirable, neither are the effects of unpredictable queue growth. To address this, we turn the paradigm on its head and have destination actors pull messages rather than wait for them to be pushed.

Side note: reactive streams uses backpressure to address this issue. If you can adopt akka-streams, this work-pulling discussion is mostly academic.

The work-pulling pattern looks something like this:

A single actor (let's call it the 'master') is in charge of distributing work.
One or more workers prepare to handle work (I only show one actor in the drawing, to prevent visual clutter). They register with the master as willing-and-able to do that work (message 0).
When the master has work to be done, it sends a broadcast to all the actors, telling them there is work available (message 1)
The actors receive the broadcast and prepare to do work
The actors ask the master for a work item (message 2)
the master gives them a single work item (message 3)
The process in steps 5,6 repeats until the master stops responding with more work items
The system goes idle until the master receives more work, broadcasts availability again, and the process repeats.

So what may the code look like? Here's a naive master and worker. I call it naive because there is no specific effort to optimize he collections used, or the broadcasting from the master to worker.

To demo this process, we may try something like:

Which gives us the following output:

And that's about it. The code can definitely be optimized in various ways, it is only meant to show the basic concept. Various things to consider are the collections being used, whether workers can come in and participate after a workload has been advertised, and other application-specific concerns. Structuring the worker actors as children of the master actor, where appropriate, may also reduce the need to maintain a discrete list of workers, or the need for the worker-advertisement (WorkerIsAvailable) event.

0 Comments

Akka work-pulling pattern (and some background)

Archives

Categories