Akka.net - Actor system is now quarantined03 Nov 2018
Whilst developing earlier today I encountered an issue with my environment which left me a little scared…
[17:49:36 WRN] Association to [akka.tcp://[email protected]:8081] having UID  is irrecoverably failed. UID is now quarantined and all messages to this UID will be delivered to dead letters. Remote actorsystem must be restarted to recover from this situation. [17:49:36 INF] Shutting down remote daemon. [17:49:36 INF] Remote daemon shut down; proceeding with flushing remote transports. [17:49:36 WRN] Association with remote system akka.tcp://[email protected]:8081 has failed; address is now gated for 5000 ms. Reason is: [Akka.Remote.EndpointDisassociatedException: Disassociated
What was scary is the state my application was now left in.
My API continued to be responsive even though the actor system was now “dead” or should I say “must be restarted” as per the error message.
I guess this scenario is far from ideal but realistically it’s something to be expected.
At this stage I’m not entirely sure why my actor system became
Disassociated but I can appreciate that network blips, bad code, GC locks might all contribute to events like these in the future.
My real issue here isn’t my actor system shutting down, it’s my API continuing to accept requests.
So how can we hook into the cluster chat to determine when I’ve been quarantined?
Enter: the terminator.
Terminator in my case is a simple Actor ready to receive
QuarantinedEvent messages delivered to my unhealthy cluster node.
Here I’m using a small wrapper around the ActorSystem called
Actors. This object gives me some control around the actor system lifecycle and I use it to invoke the
CoordinatedShutdown method as recommended by the Akka.net team.
Having a teminator in your code is all well and good but we need to make sure he’s protecting our node from unwanted
Cyborg attacks quarantined events.
Let’s add the following to our actor system setup.
And that’s all that’s needed.
For clarity my
Program.cs looks similar to this. You can see my application will exit if either API or the actor system exit.
Events are bound to both the cancel key press and assembly unloading to give cross windows development/docker (+ kubernetes) support.