I really like Zabbix. It’s open source, been around for well over a decade, and does exactly what I need for monitoring my gear. It gives me a dashboard I can check anytime, and sends me an email when there is a problem that I consider important enough to be notified about.
The header picture is of my Block Operations lab. I can monitor my Zcash miners at my lab, the ones running in my office as space heaters, and all the systems at my mining production facility, all on one Zabbix server screen.
Here is a screenshot of what I monitor with Zabbix on my GPU miners running the Optiminer Zcash 1.6.0 application.
I have the main Zabbix application running on a $10/month VPS at Linode, so it’s always available to me. At my mining locations, I use a Raspberry Pi running Raspian as a $60 linux server running the Zabbix proxy application. The Zabbix proxy connects back to the Zabbix server VPS over any available internet connection, so even if my primary internet has failed over to the backup, the Zabbix proxy still connects and uploads data.
For Antminers, I had to create a script that queried the Antminer status with JSON. Any bash script that provides data back to the Zabbix server works, so I was happy with that.
For a Linux or Windows box, Zabbix is much more capable. By installing the Zabbix agent on the GPU miner, the agent provides lots of basic data back to the Zabbix server, like free disk space, CPU speed, etc., and will send an alert on reboot or other problem.
The agent can also perform active check. The one I like for monitoring mining operations is log parsing. The Zabbix agent will parse a log file, looking for a regex match. On making the match, the specific information is sent to the proxy for inclusion in the GPU miner data.
Out of the box, Optiminer sends data to the screen. With a simple addition to the optiminer start command in the mining script, it also creates a log:
| tee /home/user/log/optiminer.log
I can then configure a Zabbix server template with items and triggers on that data. This is a screenshot of the Item page for GPU0 Hash Rate.
It’s string data, not integer, so I can’t create a trigger off a minimum value. And I don’t feel like converting string to integer, so I just set the trigger to alert if the log does not provide any fresh log entries within a specific amount of time. When the miner hangs, the log file doesn’t update, and I get an email alert about a problem.
I prefer an email alert so I can figure out what is wrong, then correct it. I don’t want my GPU miners constantly rebooting if at all possible. I set the problem level to High on this trigger, because I get emails about High and Disaster problems….although I actually don’t have any triggers set to Disaster.
Right now the GPU temperature is not being monitored by Zabbix. When I was using Claymore, that information was available in the log file, so I just parsed it out. Optiminer does not provide GPU temperatures, so I don’t have that information. It’s wintertime here, so I’m not worried about temperature readings just yet – I will definitely need them in the summer.
That’s ok, I can just write a script on my Linux box to use the lm-sensors application to log the GPU temperature to a file, then get and display the information the same way the GPU Hash rate is displayed. That’s on the to-do list.
Zabbix is full of all the features I need. I can set a maintenance period if I am doing work on machines so I don’t get flooded with emails. I can also set a dependency – if the network switch that my Zcash miner is connected to is unreachable by ICMP ping, then the trigger on the GPU miners won’t fire, because they are dependent on the switch being up and running.
Of course I can access the Zabbix server from my phone also:
It does not matter what monitoring system you use for your mining operations, and it helps to have one that alerts you about problems.
I like monitoring my servers, Antminers, switches, and other systems from the same flexible system. For example, this is a monitor screen of some of my Antminer S9’s. The orange is where the hash is below spec – time for a maintenance period.
If you have a monitoring system you like to use, or if you provide one, please let me know in the comments below. I am always eager to learn.