After /var partition on master MySQL server was full, replication has stopped working. Master server wasn't able to write new changes to the binlog file. Recovery procedure seems simple, clean /var partition and restart slave servers but it was not so easy. The problem was even bigger because "slave stop" and "show slave status" commands didn't give any output and terminal was stalled right after pressing return key.
Before going any further, here are MySQL versions of master and slave servers:
Master version: 5.0.77 Slave version: 5.1.61
Our MySQL replication architecture contains one master and three slave servers. All three slave servers could not connect to the master and ask for changes - replication was not working. Probably binlog on master was corrupted and that caused the problem. Anyway, no matter what happen, it should be allowed to see slave status or to stop the slave server. In our case, when any of the following commands were executed:
slave stop; show slave status\g
... MySQL shell was stuck and only CTRL-C command could return control to the terminal. It was needed to somehow point slave server to the binlog file right after master server was fixed and the best option is with "change master to" command:
change master to master_log_file='mysql-bin.002381', master_log_pos=0;
This command actually changes info in master.info and relay.log files. To use "change master" command, the slave replication threads must be stopped but "slave stop" hangs terminal and doesn't stop slave thread (?!). One idea was to directly change binlog position in the master.info and relay.log files, but that is to risky. Better option is to temporary comment all "replication lines" from /etc/my.cnf file and restart slave server:
#server-id=4 #master-host=126.96.36.199 #master-user=replica #master-password=mypassword #relay-log=myserver-relay-bin #replicate-do-db=web #replicate-ignore-table=web.logging #report-host=myserver.hrt.hr
With this trick slave server became ordinary MySQL server. After server was restarted it didn't try to connect to the master and ask for changes from damaged binlog file. Here is complete list of steps how MySQL replication problem was solved:
1. Stop MySQL slave server (we have to use "kill -9" because it was not possible to stop server with /etc/init.d/mysqld stop) 2. Comment replication config lines from /etc/my.cnf file (like is mentioned before) 3. Start MySQL server # /etc/init.d/mysqld start 4. Reset slave server mysql> reset slave 5. Stop MySQL server # /etc/init.d/mysqld stop 6. Return replication config return replication config lines in my.cnf (simply uncomment previously commented lines) 7. Start MySQL server # /etc/init.d/mysqld start (slave thread should be stopped) 8. Set position to the begging of first binlog file after master server was fixed mysql> change master to master_log_file='mysql-bin.001234', master_log_pos=0; 9. Start slave server mysql> slave start
Please set the name of binlog file for your case. In step 8. (change master) binlog name is only example. After applying this procedure, slave was able to connect to the master and start to read changes.