Some notes on Galera
Some quick notes I wrote on the train about my experience with the World's Greatest Leader-Leader MySQL Clustering System.
Some quick notes I wrote on the train about my experience with the World's Greatest Leader-Leader MySQL Clustering System.
Questions? Comments? Stuff I got wrong? Send me a note at nat@simplermachines.com
In order to start accepting reads/writes, a Galera node has to join a cluster. You can configure Galera to start as a single node cluster, and under certain mysterious conditions that node will fail to start because itâs failing to join a cluster⊠with itself.
There are versions of Galera that could stop replicating DML statements while still replicating DDL statements. This should be impossible but oh boy it is not. This can produce terrible operator errors if youâre not expecting it and you use schemas matching across clusters as evidence that replication is working. AFAIK this behavior silently went away in some more recent version of Galera but no one knows why.
Related to above, when a Galera node rejoins a cluster and detects that it is out of sync with that cluster it will dump all of its data and get a fresh copy from a donor. This is *very very bad* if that node happened to have the correct dataâ a system can suddenly lose weeks of data, and can lose it in response to an operator trying to repair a cluster. This was much worse for Cloud Foundry because of the way it used Bosh to manage Galera automatically.
Row locks donât replicate, but some applications use row locks as mutexes for the entity represented by the row. This can cause very bad problems if applications donât realize theyâre using Galera, two batch processors connect to different Galera nodes, and the batch job theyâre running is something like âcharge the bill represented by this row to a customerâs credit card.â
We wrote a proxy to attempt to force all applications to always connect to the same Galera node. The proxies node selection algorithm needed to be deterministic so flickers in network available couldnât split the two proxies. We chose âalways connect to the running node with the lowest instance ID.â This worked great⊠unless the node with the lowest instance ID had a lousy disk, and would crash when it first processed a read. At that point the Galera + proxy system would get stuck in a loop where Node 0 would crash, the proxies would redirect app traffic to Node 1, Node 0 would restart itself and run its data load process, come back up, the proxies would flip back to Node 0, and then immediately crash again. Once the team figured out what happened they were able to make the behavior robust to this scenario but this kind of thing is just constant if youâre trying to run Galera programmatically and youâre trying to protect yourself against its various failure modes.
There are also issues with certain kinds of queries being able to cause problems across the cluster â noisy neighbor problems basically â including one colorfully named âthe Stupakov effectâ but that came after my time so I donât have command of the details.
I think itâs possible to successfully run a Galera cluster for a single application or a small set of applications that all know theyâre running Galera and are basically well behaved & low traffic. It works fairly well as the database for Cloud Foundryâs control plane â the only failures Iâve ever seen involving that database were traceable to the SSO service which allowed apps on the platform to use it for login which somewhat breaks that âlow trafficâ rule, and even there the failure Iâm thinking of was âwe ran out of connections in the connection pool,â which could have happened with any database.