Every once in a while a hashboard for an Antminer S9 will go bad. I look on the display and it has all XXXX’s on it. Before sending the board back for repair I’ll reboot it just to make sure that as soon as the machine boots up the hash board goes bad.
I have two boards that have been troubling me for months. They run for about 30 minutes, then stop with all XXX’s. So I sent them to Bitmain for repair. They got sent back to me without any apparent improvement. They still go bad after 30 minutes.
I ran them for a while, rebooting the box daily when I remembered, and the same thing kept happening. In the next batch of boards to send back to Bitmain I included these two boards, again. They were returned to me, again apparently unfixed.
What I suspect happens is that Bitmain tests the board and runs them for about 5 or 10 minutes. If they look good, they send them back.
I put these two boards into their own box and ran them. Every day when I would think about it I would go to reboot the box, they would hash for a little while, then stop.
A friend of mine suggested that I set it to reboot automatically every hour. That’s a good idea and it find it hard to resist a technical challenge so I logged into the command line on the Antminer and tried to set up a Cron job to reboot it. Cron does not exists on the dumbed down operating system running on these boxes, and I did not want to figure out how to install it. I needed another way to reboot the box every hour.
Well, I have a small Linux server on site running on a Raspberry Pi that I use as a proxy for the monitoring system I have running on my cloud Linux server.
The Raspberry Pi is a very simple $70 system that can function as a full Linux server. I keep it plugged into the UPS sitting on top of the firewall, and if building power goes down it still keeps uploading system info back to the cloud server.
The cloud monitoring system is based on Zabbix. I did a different blog post on it earlier this year, and have expanded it to monitor multiple sites, network equipment, Antminers, Windows and Linux Zcash and Ethereum miners, and hosted VPS servers. It’s pretty handy, and helps to keep the equipment running and my costs low. I use that so I can at a glance see that everything’s working well and when something doesn’t the system sends me an email telling me about a problem.
Anyway I figured I could use this Linux box to reboot the Antminer.
I first setup that I could login from the Linux box to the Antminer S9 without giving a password by generating a public and private RSA key and inserting that key to the miner.
Generate a RSA key:
Push key to Antminer. Replace 10.0.0.10 with the IP address of your Antminer. Root password is admin.
Check that it worked by SSH to the miner. It should connect with no password.
Here is an example of the full dialog:
[email protected]:~ $ ssh-copy-id email@example.com /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys firstname.lastname@example.org's password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'email@example.com'" and check to make sure that only the key(s) you wanted were added. [email protected]:~ $ ssh firstname.lastname@example.org [email protected]_0010:~#
Then I set up an hourly crontab job on the Linux box to log in and reboot the Antminer.
Access cron with this command, choose your favorite editor:
Add this line to the end of the file:
0 * * * * ssh email@example.com "/sbin/reboot"
That worked pretty good. Except it wasn’t often enough. So I changed it to every thirty minutes:
0,30 * * * * ssh firstname.lastname@example.org "/sbin/reboot"
Now the miner is rebooting every 30 minutes, and it is contributing significant hashing power again.
I’m going to keep running it this way for as long as I need to and then maybe one day the hash boards will be completely bad and I’ll send them in, and finally get them repaired. Or scrapped.
Also published on Medium.