I work in an office where working late in the night is the norm including weekends.. This post is not about my office or about the work we do late into the night but about a problem statement posed to me by my subordinates. It goes like this..
The office or rather the building rules demand that we shut down (hard off) all electrical equipment once the office is finally shut for the day. This means that our office servers also requires to be shut down along with the UPS (APC SMART 2200). Now this has nothing to do with our office trying to be green but more to do about obviating any fire risks.
So, whats the problem. The problem is that, once we (management) leave the office, the duty staff has to shut down all the equipment including the servers. The issue is that our two servers running WindoZe 2008 R2 and WindoZe 2003 take about 15 minutes to shut down cleanly. And the UPS then needs to be shutdown afterwards. 15 minutes may seem a small time interval for an office which works routinely for 15+ hours in a day in a single shifts 7 days a week. But at night 2330 when its time to go home every minute looks like an hour to the subordinate staff. So the problem was narrated to me over a short tea break. Since I believe in working Smarter and not Harder, I decided to save 15 man minutes every day.
Thus (coming to the point), this post describes a circuit / contraption I have devised which shuts down the UPS once it detects that both the computers dependent upon it have secured for the day.
Detection of the state of the servers is done by pinging the servers at a predetermined time interval. However generally since persistent ICMP pings are blocked by firewalls, I have decided to do an ARP request to find the state of servers. So in a nutshell if a server responds to ARP request it is deemed to be alive.
The Arduino board communicates with the UPS over a serial port. The serial communication is done at 2400 8N1, No handshake. The communication protocol used is well discussed and elaborated here. Once the micro-controller detects that the servers are down for a defined time interval, it is assumed that the servers are off and a shutdown command is then issued to the UPS via Serial Port.
However, during the day time there is a possibility that the network (read physical media) might itself be under maintenance or the network switch might be down due to a power outage, so there emerges a possibility that the server may not respond to any ARP requests, leading to a shutdown of the UPS and thereby causing an unclean server shutdown.
To circumnavigate this issue, it becomes essential that the microcontroller knows the time of the day and shutdowns the servers only after 2000 hrs . The solution to this problem is a simple RTC clock. Now how does the RTC update its own time.. NTP. One of my server functions as an NTP server and I decided to use the same to update my RTC clock one when its booted up. Once this was done, the code was modfied accordingly.
Knowing time of the day was also essential since I had to switch on the UPS and then the servers. UPS is switched on via a serial command and the servers are switched on via a Wake on Lan (WOL) magic packet. Not implemented since I have configured the Server’s Bios to WakeUp on restoration of power on.
I have also thrown in a DS18B20 temperature sensor to measure the ambient room temperature.
The system state including UPS statistics are available on a webpage… The Arduino board also behaves like a webserver.
Code (Sans NTP +RTC Part)
#include <OneWire.h> #include "EtherShield.h" #include <Wire.h> #include <RTClib.h> //---------------defines for EtherShield---------------------- #define MYWWWPORT 80 //www server port #define BUFFER_SIZE 1200 static uint8_t buf[BUFFER_SIZE+1]; //data buffer static uint8_t mymac[6] = {0x54,0x55,0x58,0x12,0x34,0x56 }; static uint8_t myip[4] = {159,12,23,148}; static uint8_t Server1ip[4] = {159,12,23,160}; static uint8_t Server2ip[4] = {159,12,23,150}; static uint8_t ntpServer[4] = {159,12,23,150}; long server_poll_interval=30000; //query/ping server status every n seconds uint16_t print_webpage(uint8_t *buf); //function prototype EtherShield es=EtherShield(); //instantiate class //---------------defines for UPS Manager---------------------- long serialtimeout=500; //time in milliseconds to timeout serial port response long serial_poll_interval=60000; //query ups every minute int shutdown_grace=5; //time in minutes after which shutdown command will be issued to ups int ups_shut_flag=2;//1-> UPS OFF 0->UPS ON 2-> dont know long shutdown_interval=0; int server1=0; //server1 flag int server2=0; //server2 flag boolean ups_state=false; //ups flag char line_voltage[6]; char line_freq[6]; char batt_voltage[6]; char load_voltage[6]; char load_factor[6]; char batt_level[6]; char estimated_runtime[6]; char status_flag[6]; char ups_flag[6]; char response[5]; char internal_temp[6]; char external_temp[6]; char time_string[50]; //---------------defines for RTC , NTP & DS18S20 Temperature Module-------------------- OneWire ds(6); //Sensor connecd to pin 6 #define SECONDS_FROM_1900_TO_1970 2208988800 RTC_DS1307 RTC; //---------------function---------------------------------------------------------------- uint16_t http200ok(void) { return(es.ES_fill_tcp_data_p(buf,0,PSTR("HTTP/1.0 200 OK\r\nContent-Type: text/html\r\nPragma: no-cache\r\n\r\n"))); } // prepare the webpage by writing the data to the tcp send buffer uint16_t print_webpage(uint8_t *buf) { // get_RTC_time(time_string); uint16_t plen; plen=http200ok(); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<center><p><h1>Welcome to Ishan-s UPS Shutdown Manager V0.1 </h1></p> ")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<hr>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<table border=1 align=center>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<tr><th>Server 1 Status</th><th>Server2 Status</th><th>UPS Status</th></tr>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<tr><td align=center><strong>")); if(server1==1) plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("ON")); else plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("OFF")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("</strong></td><td align=center><strong>")); if(server2==1) plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("ON")); else plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("OFF")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("</strong></td><td align=center><strong>")); if(ups_state) plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("ON")); else plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("OFF")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("</strong></td></tr>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("</table>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<h2 align=center>UPS Status</h2>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<table align=center border=1>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<tr><th>Line Voltage (Volts)</th><td>")); plen=es.ES_fill_tcp_data_len(buf,plen,line_voltage,6); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("</td></tr>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<tr><th>Line Frequency (Hz)</th><td>")); plen=es.ES_fill_tcp_data_len(buf,plen,line_freq,6); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("</td></tr>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<tr><th>Battery Voltage (Volts)</th><td>")); plen=es.ES_fill_tcp_data_len(buf,plen,batt_voltage,6); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("</td></tr>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<tr><th>Output Voltage (Volts)</th><td>")); plen=es.ES_fill_tcp_data_len(buf,plen,load_voltage,6); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("</td></tr>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<tr><th>UPS Load (%)</th><td>")); plen=es.ES_fill_tcp_data_len(buf,plen,load_factor,6); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("</td></tr>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<tr><th>Battery Level (%)</th><td>")); plen=es.ES_fill_tcp_data_len(buf,plen,batt_level,6); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("</td></tr>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<tr><th>Internal Temp(deg C)</th><td>")); plen=es.ES_fill_tcp_data_len(buf,plen,internal_temp,6); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("</td></tr>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<tr><th>External Ambient Temp(deg C)</th><td>")); plen=es.ES_fill_tcp_data_len(buf,plen,external_temp,6); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("</td></tr>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<tr><th>Estimated Runtime (Minutes)</th><td>")); plen=es.ES_fill_tcp_data_len(buf,plen,estimated_runtime,6); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("</td></tr>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("</table>")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<hr>")); plen=es.ES_fill_tcp_data_len(buf,plen,time_string,50); if (plen >= BUFFER_SIZE) { plen=es.ES_fill_tcp_data_p(buf,0,PSTR("HTTP/1.0 200 OK\r\nContent-Type: text/html\r\n\r\n")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<center><p><h1>Welcome to Ishan-s UPS Shutdown Manager V0.1 </h1></p> ")); plen=es.ES_fill_tcp_data_p(buf,plen,PSTR("<p><h1>Error: buffer is too small...</h1></p> ")); } return(plen); } void setup(){ pinMode(16,OUTPUT); pinMode(17,OUTPUT); digitalWrite(16,LOW); digitalWrite(17,HIGH); pinMode(5,OUTPUT); pinMode(7,OUTPUT); digitalWrite(5,HIGH); digitalWrite(7,LOW); Serial.begin(2400); Wire.begin(); RTC.begin(); // Initialise SPI interface es.ES_enc28j60SpiInit(); // initialize enc28j60 es.ES_enc28j60Init(mymac); // init the ethernet/ip layer: es.ES_init_ip_arp_udp_tcp(mymac,myip, MYWWWPORT); //calculate shutdown interval shutdown_interval=(shutdown_grace*60000)/server_poll_interval; //init serial port get_ntp_time(); } void loop(){ unsigned long starttime,servertime; int shutcount=0; uint16_t plen, dat_p; while(1) { while ((millis() - starttime > serial_poll_interval) && (ups_shut_flag!=1)){ ups_state=false; query_ups('Y',response); if (response[0]=='S' && response[1]=='M'){ query_ups('L',line_voltage); query_ups('F',line_freq); query_ups('B',batt_voltage); query_ups('O',load_voltage);query_ups('P',load_factor); query_ups('f',batt_level); query_ups('j',estimated_runtime); query_ups('Q',status_flag); query_ups('C',internal_temp); query_ups('R',response); ups_state=true; } //read TIME & temperature dtostrf(get_temp(),6,2,external_temp); starttime=millis(); } Serial.flush(); while (millis() - servertime > server_poll_interval) { server1=arp_ping(Server1ip); //ping server1 server2=arp_ping(Server2ip); //ping server2 if (server1==1 || server2==1){ shutcount=0; ups_shut_flag=0; } //do not switchoff if any of the server is alive if (server1==0 && server2==0){ shutcount++; } if ((shutcount>shutdown_interval) && ups_shut_flag!=1){ shutdown_ups();//issue powerdown command to ups } servertime=millis(); } // read packet, handle ping and wait for a tcp packet: dat_p=es.ES_packetloop_icmp_tcp(buf,es.ES_enc28j60PacketReceive(BUFFER_SIZE, buf)); /* dat_p will be unequal to zero if there is a valid http get */ if(dat_p==0){ // no http request continue; } // tcp port 80 begin if (strncmp("GET ",(char *)&(buf[dat_p]),4)!=0){ // head, post and other methods: dat_p=http200ok(); dat_p=es.ES_fill_tcp_data_p(buf,dat_p,PSTR("<h1>200 OK</h1>")); sendtcp(dat_p); } // just one web page in the "root directory" of the web server if (strncmp("/ ",(char *)&(buf[dat_p+4]),2)==0){ dat_p=print_webpage(buf); sendtcp(dat_p); } }//end of while loop }//end of main loop void sendtcp(uint16_t dat_p){ es.ES_www_server_reply(buf,dat_p); } char *query_ups(char cmd,char *temp){ int i=0; long Stime; //flush serial port Serial.flush(); //senda command on serial port Serial.print(cmd);//wakeup UPS from dumb mode Stime=millis(); delay(50); //wait for reply till timeout while(millis()-Stime<serialtimeout){ //read serial input if(Serial.available()>0){ while(Serial.available()>0){ char inChar=Serial.read(); //add it to the inputString: *(temp+i)=inChar; i++; } } } *(temp+i)='\0'; //null terminate the buffer return temp; } int arp_ping(uint8_t *remIP){ int state=0; uint32_t endtime,dat_p; uint16_t timeout=5000;//time out ping after 500 milliseconds endtime=millis(); //send an arp request to get the mac address of the remote IP es.ES_client_set_gwip(remIP); while(1){ if(millis()-endtime>=timeout){ state=0; return 0; }//time out ping int plen=es.ES_enc28j60PacketReceive(BUFFER_SIZE,buf); dat_p=es.ES_packetloop_icmp_tcp(buf,plen); if(dat_p==0){ //we idle here if(es.ES_client_waiting_gw()){ continue; } } if(!es.ES_client_waiting_gw()){ //got arp reply state=1; return 1; } } } void shutdown_ups(){ //get time from rtc clock DateTime now = RTC.now(); int hour=now.hour(); int minute=now.minute(); //execute shutdown only after 1700 if (hour>=15) if (minute>=30){ query_ups('Y',response); delay(500); query_ups('Z',response); delay(1600); query_ups('Z',response); query_ups('R',response); ups_shut_flag=1; pinMode(2,OUTPUT); pinMode(8,OUTPUT); digitalWrite(2,HIGH); digitalWrite(8,LOW); } } float get_temp(){ byte i; byte present = 0; byte type_s; byte data[12]; byte addr[8]={0x10, 0x28, 0xC, 0xBB, 0x0, 0x8, 0x0, 0xBE}; float celsius; type_s=2; //ds18s20 ds.reset(); ds.select(addr); ds.write(0x44,1); // start conversion delay(1000); present = ds.reset(); ds.select(addr); ds.write(0xBE); // Read Scratchpad for ( i = 0; i < 9; i++) {data[i] = ds.read();} // convert the data to actual temperature unsigned int raw = (data[1] << 8) | data[0]; raw = raw << 3; // 9 bit resolution default if (data[7] == 0x10) { // count remain gives full 12 bit resolution raw = (raw & 0xFFF0) + 12 - data[6]; } celsius = (float)raw / 16.0; return celsius; } boolean get_ntp_time(){ uint32_t endtime,d; boolean stat=false; uint16_t timeout=3000; //timeout ping after 2 seconds endtime=millis(); uint16_t dat_p; uint16_t clientPort = 123; char dstr[4]; int sec = 0; int plen = 0; // Get IP Address details // Main processing loop now we have our addresses while(1) { if(millis()-endtime>=timeout){return false;} // handle ping and wait for a tcp packet dat_p=es.ES_packetloop_icmp_tcp(buf,es.ES_enc28j60PacketReceive(BUFFER_SIZE, buf)); // Has unprocessed packet response if (dat_p > 0) { uint32_t time = 0L; if (es.ES_client_ntp_process_answer(buf,&time,clientPort)) { if (time) { time -= SECONDS_FROM_1900_TO_1970; //calculate no of seconds after 2000 time +=19800; //add offset of +5hr 30 min for IST (Indian Standard Time) DateTime now(time); DateTime RTC_now = RTC.now(); //sync RTC to NTP RTC.adjust(DateTime(now.year(),now.month(),now.day(),now.hour(),now.minute(),now.second())); return true; } } } if (stat==false){ if (set_mac(ntpServer)) { es.ES_client_ntp_request(buf,ntpServer,123);}//reset timeout counter stat=true; } } } boolean set_mac(uint8_t *remIP){ uint32_t endtime,dat_p; uint16_t timeout=3000; //timeout ping after 500 milli seconds endtime=millis(); es.ES_client_set_gwip(remIP); while(1){ if(millis()-endtime>=timeout){return false;}//timeout ping int plen=es.ES_enc28j60PacketReceive(BUFFER_SIZE,buf); dat_p=es.ES_packetloop_icmp_tcp(buf,plen); if(dat_p==0) { // we are idle here if (es.ES_client_waiting_gw() ){ continue; } } if (!es.ES_client_waiting_gw() ){return true;} } }










Pingback: Automating the shutdown of APC UPS devices - Hack a Day
Pingback: Automating the shutdown of APC UPS devices » Geko Geek
Pingback: Automating the shutdown of APC UPS devices | ro-Stire
Pingback: Automating the shutdown of APC UPS devices « Uncategorized « Cool Internet Projects
This is stupid. An APC network management card costs $60 or so off eBay, drops into your UPS management card slot right off the bat and has all this functionality and more, such as email alerts when power goes out, scheduled powerup and powerdown, etc.
I appreciate what you’ve done here but it’s still stupid when an off-the-shelf solution exists, works more effectively, and has far more functionality.
Hi Andrew,
I was expecting this comment , but then APC network management card is $60…. This stuff is $40… Money aside.. For me it was more of a ‘project’ rather than a ‘task’.
Though the network management card is a no-nonsense solution, the amount of knowledge gained and bit of hard currency money saved tilts the balance in its favour.
Thanks.
sorry for the double post a hickup happened on my end that caused a page reset.
have you ever thought of connecting up a car battery or even paralleling multiple batteries so you can get many hours or even days of run time?
with extra runtime you can then just unplug the ups (conforming to the office rules) and still keep the server on.
if your server is on a laptop you can get one of them car laptop power supplies and connect 2 car batteries in parallel so you can run the server during avery long weekend/holiday
above works if it unattended power usage they are worried about.
if it security that they are worried about ( un authorized access to the server during the off hours and the admin not knowing it and being able to do anything about it) they could.
1. connect an alarm that alerts the admin by phone or email of hacking attempts.
2. put all the server operations on a hosting provider passing the liability to someone else
Pingback: Automating the shutdown of APC UPS devices | CisforComputers
Pingback: Automating the shutdown of APC UPS devices | My Blog
Pingback: Automating the shutdown of APC UPS devices | TechnoFiesta
Im sure it’s about a fire hazard instead of network attacks or wasting energy.
Just curious, have you ever had a server catch on fire?
With Gods’ graciousness… NO…..
But have seen a major fire caused by a malfunction inverter and other by leaky batteries (both were branded)
Pingback: Automating the shutdown of APC UPS devices | Kleoz.Net
Just leave to UPS and/or servers on. We had a massive UPS and 12 servers in one room, on 24/7 for 6 years and never had a fire.
However as a maker/tinkerer I can appreciate why you went to the effort of creating the system, especially if it was on works time. However it is pretty over kill for your needs. Not to mention servers regularly don’t want to shutdown sometimes which will of course mean that the UPS will stay on over night.
I think you did it the wrong way around to start with. Assuming the UPS gives enough time to shut down all the servers, just set a timer or manual switch on the UPS and let the UPS software on the servers shut them down. Or shut down the non-essential servers early using some sort of scheduling service.
Failing that, why didn’t you just plug a usb cable into each server and measure the 5 volts. When they go low it is safe to assume that the server is off. Then when x servers are shut down, turn off the UPS.
Hey,
It seems this post is so good that it actually crashed your server
Great detail. Do you have an email I can catch you on? Have a few questions to ask.
Thanks
Ryan
Thanks for passing by Ryan…you can ask the question here as it will benefit other site visitors.. i have mailed to my email id.