Antminer S9 Monitoring and Alerting Application
One of the new Bitcoin miners I put into production last week had a problem. The fan stopped working after a little while, and it stopped hashing. The same thing happens when an Antminer overheats or has some other safety feature kick in that stops it from hashing. It’s still running, but it’s not doing any work for me.
I had gotten in the habit of checking into the mining pool every day to see if any of the Antminer’s have gone offline from the pool. Sometimes, when there is a building power outage, I don’t find out about it until a couple of hours later, after my firewall runs out of power on the UPS and the system sends me a disconnect alert.
I needed a system that would monitor the actual mining activity of the Antminers and tell me when there was a problem. Doing a network ping is no good. The bitcoin miners are still online and responding to pings.
It’s possible to SSH into the Antminers (user: root, pass: admin), and as a side note, for security you probably want to change the root password of the boxes in your datacenter.
So I thought about installing a SNMP client and using Simple Network Management Protocol to monitor them. But I didn’t really want to start messing with the operating system configuration.
The Antminer operating system is based on cgminer, an open source mining application. This application has a monitoring and management system referred to as the cgminer RPC API.
There are some decent appearing monitoring systems available for miners out there, like Cryptoglance. But I wanted something simple that would just tell me when there was a problem. Going back to my days setting up a customer facing IT support desk, I knew what needed to be done. I needed Zabbix.
Zabbix is an open source monitoring and alerting application. It’s enterprise grade, and can scale to monitor thousands of clients. It’s primary ways of monitoring are:
- Zabbix Agent installed on client – usually for servers
- SNMP – Simple Network Management Protocol
- IPMI – Intelligent Platform Management Interface
Unfortunately none of these would work for me. I didn’t want to install a Linux agent on the Antminers, or a SNMP client. IPMI is for plugging into the third Ethernet port on the back of servers, and was not a match.
But Zabbix is very extensible. And one of the things it can do is run a script and take an output of that script.
Fortunately, a fellow named Thomas Sileo had played around with the cgminer RPC API a few years ago, and set up a short Python application to use JSON sockets to access the API.
I’m learning Python, thanks to Zed Shaw’s Learn Python the Hard Way, and was able to at least have a basic understanding of the application.
Since I’m doing Ethereum mining I have a few Ubuntu boxes running anyway, so I added Apache, MySQL, and PHP5 to one of them. I installed Zabbix, and got it running. It’s not really that hard to install. And it’s free!
It is more difficult to set it up properly, make it run reliably, and scale to thousands of devices. Once you start saving lots of data on thousand of devices, some special things need to be done to the databases to optimize the installation and prevent it from bogging down. Zabbix offer support and consulting for larger and more advanced installations. But for monitoring some boxes locally, it’s perfect.
Then I needed the script. I played around with the Python application for a little while. It pulls some good information from the Antminer S9 using the summary command, which is one of the API commands allowed in the default configuration option of the Antminer.
The API commands can be configured in the /config/bmminer.conf file on the Antminer S9 if desired, but I didn’t need to. Here’s the relevant section from the bmminer.conf showing permissions:
"api-listen" : true,
"api-network" : true,
"api-groups" : "A:stats:pools:devs:summary:version",
"api-allow" : "A:0/0,W:*",
"bitmain-use-vil" : true,
"bitmain-freq" : "600",
"bitmain-voltage" : "0706",
"multi-version" : "1"
From the summary command of the API, I decided to use the Average Hashing Speed as an indicator if the box is working properly or not. On my S9’s, that’s hanging out around 12,000. On the box with the fan not working, it’s zero. So that’s a start.
Zabbix will let you set up a host check using a script, but as far as I can tell it needs to be a Bash script. So I put the python script in the Zabbix externalscripts directory, and a Bash script that calls the Python script. The output of the script is just a number.
I had to add a couple things. Sometimes the API does not report anything back. So I have the script return a zero. And sometimes the Antminer is not reachable over the network. Instead of having the script puke and Zabbix take the check offline, I have the script report a zero.
It’s best to add the Item, Triggers, and Graphs in the Templates section, then apply the template to the Hosts. By adding the IP address of the miner in the Host configuration, you can use the {IPADDRESS} argument in the Item. The Bash script takes the IP address and uses it to call the Python script.
I added the script as a host Item in Zabbix.
Then I created a graph and a trigger.
After running for a little while, I have a nice main screen with a System Status.
I can get an overview of all my devices.
The Python script looks like this:
from sys import argv
import socket
import json
import ast
# call this with arguments of the IP address of the antminer and command and command argument
# like python cgminer1.py 10.64.12.160 summary None
class CgminerAPI(object):
""" Cgminer RPC API wrapper. """
def __init__(self, host, port):
self.data = {}
self.host = host
self.port = port
def command(self, command, arg=None):
""" Initialize a socket connection,
send a command (a json encoded dict) and
receive the response (and decode it).
"""
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
sock.connect((self.host, self.port))
payload = {"command": command}
if arg is not None:
# Parameter must be converted to basestring (no int)
payload.update({'parameter': unicode(arg)})
sock.send(json.dumps(payload))
received = self._receive(sock)
finally:
sock.shutdown(socket.SHUT_RDWR)
sock.close()
return json.loads(received[fusion_builder_container hundred_percent="yes" overflow="visible"][fusion_builder_row][fusion_builder_column type="1_1" background_position="left top" background_color="" border_size="" border_color="" border_style="solid" spacing="yes" background_image="" background_repeat="no-repeat" padding="" margin_top="0px" margin_bottom="0px" class="" id="" animation_type="" animation_speed="0.3" animation_direction="left" hide_on_mobile="no" center_content="no" min_height="none"][:-1])
def _receive(self, sock, size=4096):
msg = ''
while 1:
chunk = sock.recv(size)
if chunk:
msg += chunk
else:
break
return msg
def __getattr__(self, attr):
""" Allow us to make command calling methods.
>>> cgminer = CgminerAPI()
>>> cgminer.summary()
"""
def out(arg=None):
return self.command(attr, arg)
return out
script, host_ip = argv
try:
antminer1 = CgminerAPI(host_ip, 4028)
result = antminer1.command('summary')
#convert result to dictionary
list_summary = result['SUMMARY']
summary = str(list_summary[0])
summary_dict = ast.literal_eval(summary.replace("u'","'"))
if summary_dict['GHS av'] == "":
summary_dict['GHS av'] = "0.0"
print summary_dict['GHS av']
except:
print "0.0000"
The Bash script looks like this:
#!/bin/sh /usr/bin/python /etc/zabbix/externalscripts/miner_GHS_av.py $1
Now almost everything looks good. The one box in which I need to replace the fan is showing a problem. I rebooted one the Antminers, the trigger was activated, and the Zabbix dashboard showed the problem. When the Antminer came back online, the dashboard went green again. The graph of hashing power shows the problem also. It’s nice to have historical information.
The majority of the Zabbix monitoring system is working well. My current setup has some problems, though.
- I need to set up better alerts. If the switch is down, then obviously the Antminers behind them will be down also. Need to set up dependencies for this.
- Building power went out this morning. I wasn’t notified. The solution to this is to set up a Zabbix master server at Amazon Web Services, then put a small, low powered Zabbix proxy at the location to be monitored, and connect the proxy to a UPS.
Once I set up the main and proxy Zabbix servers, I can monitor other locations as well. That’s what we did at my previous company. We were monitoring about 30 customer’s environments, and they had logins to the Zabbix master server so they could see the status of their own environments, but could not see anyone else’s.
One of the things I noticed was that most of our customers did not have network monitoring set up and working on their systems, and the ones that did usually had it done as a service by another company, or had a network guru onsite that really understood it.
Maybe once I get the Zabbix AWS server set up I’ll see if anyone needs their stuff monitored…and is willing to pay for it in Bitcoin!
Posted at Block Operations
Author: Rolf Versluis[/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]
Robert
August 12, 2016 @ 12:11 am
How much for your software? I’m using some s7 and some s9. My specifications are to get the Connection Status, Hashing rate, Average Hashing, Pool URL, HW Error %, PCB Temperature, Chip Temperature, Asic Status if there are x’s or -‘s, and Fan Speeds. 🙂 I need a software for windows, that can also alert me when a threshold is reached.
Rolf
August 12, 2016 @ 8:53 am
Robert,
Thanks for asking. I should do a blog post with an update on where I am at now with this monitoring system.
The problem I had is that I would have to log into the mining pool every day to make sure all my devices were working properly. I didn’t like doing that. So I set up a system that would monitor the main features and alert me if there was a problem:
1. mining pool checks: Makes sure the primary is Alive, checks my username is still the username, and checks the pool name is still the pool name I set up. This alerts me if any of this information changes.
2. Hash rate checks. If the Antminer S9 rate drops below 12,000 or the S7 below 3000, it lets me know.
3. Temperature checks. It tracks S7 temperature and graphs it. Still working on doing this for S9. I asked Bitmain to look into the API setting and fix it on the next software update.
Then I set things up so that if site power went down, I wouldn’t get hundreds of emails. Same thing with site internet.
I split the system so the main server is running in the cloud at Amazon Web Services, and a proxy server is running at each of my locations, gathering data and uploading it.
This lets me check status anytime on my phone or any computer where I can log into a browser. I get emails if there is a problem.
The system does not let me make any changes to any of the devices. It is a passive monitoring system only.
For changes, I either use the web interface or use a custom script that logs into each device and makes the changes I want.
To directly address your question, this is not software I can sell you.
If you are interested in having these capabilities for your system, what I can do is:
1. Assist you in setting up this type of system for your own environment. We could scope the project and price.
2. Provide a hosted cloud service with a monthly charge per device. Minimum charge $60/month.
If you are interested in either of these options, let me know. I need a couple of people where I can run a pilot with about 10-20 miners to refine the customer facing aspects of the service.
Rolf
Blair McBride
May 14, 2018 @ 2:39 pm
I have created a windows application that connects to all of your miners and reports temperatures, lost connectivity, hash rates and found blocks. You simply set an interval for the program to connect and report. You can set temperature alerts (3 levels) and select what other alerts you want. These will go to primary and/or secondary email address. This can give you piece of mind that everything is working.
James Halstead
May 14, 2018 @ 4:08 pm
Would you be willing to share the code for that system?
Sergey
November 18, 2017 @ 4:00 pm
python scrypt is not working
File “./miner_GHS_av.py”, line 39
return json.loads(received[fusion_builder_container hundred_percent=”yes” overflow=”visible”][fusion_builder_row][fusion_builder_column type=”1_1″ background_position=”left top” background_color=”” border_size=”” border_color=”” border_style=”solid” spacing=”yes” background_image=”” background_repeat=”no-repeat” padding=”” margin_top=”0px” margin_bottom=”0px” class=”” id=”” animation_type=”” animation_speed=”0.3″ animation_direction=”left” hide_on_mobile=”no” center_content=”no” min_height=”none”][:-1])
^
SyntaxError: invalid syntax
Ralph Richardson
December 16, 2017 @ 8:22 am
Hi Rolf,
Any update on the Antminer S9 monitoring solutions would be awesome. We are proceeding with your plans!
Ralph
Rolf
December 27, 2017 @ 6:30 pm
I have switched to using AwesomeMiner for monitoring and management. http://awesomeminer.com/
James Halstead
December 17, 2017 @ 5:56 pm
I see some of this is older. Is anyone out there available to work on this type of plan still? I have a few units and would be willing to pay for some setup consultation.
Rolf
December 27, 2017 @ 6:30 pm
I have switched to using AwesomeMiner for monitoring and management. http://awesomeminer.com/
alforro
March 7, 2018 @ 10:34 am
I am working with this
Dwight
January 1, 2018 @ 11:26 pm
Its should just read “json.loads(received[:-1])” (which ignores last char) (all the other stuff is html stuff, must of gotten mixed up).
Python 2.7.14 (default, Sep 23 2017, 22:06:14)
[GCC 7.2.0] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
>>> a = “cat”
>>> a[:-1]
‘ca’
>>>
rety
June 13, 2018 @ 3:38 am
I made a decision on the zabbix. now monitored 141 s9, 900 gpu, electricity meters, temperature and humidity sensors. previously monitored 1200 s7.
EVGENY
July 5, 2018 @ 9:04 am
HI, Rolf! I begin to learn Zabbix, and read Your article with great interest. But I dont know Phyton. I tried to start You script manually under Ubuntuserver and receive a syntax error on line 39 return json.loads(received[fusio…This question early was asked by Sergey, but I can not catch answer, given by Dwight (may be i can not translate it properly in russian 🙂 May be i have this error due to difference in Phyton verces? I have Phyton 3.5. Could You help me? Thank a lot, Evgeniy Polubentsev