Short version: Around 1am Tuesday morning we took our servers down to allow us to work on preparations for the new design and new features. During this time a hard drive problem was discovered on our lone database server. While we have multiple web servers, we only had one database server. Because it was the hard drive that needed replacing, we basically had to install and reconfigure everything all over again. However, no data was loss during this entire process as we made a complete backup before taking the server offline. All in all, Ningin was down for about 36 hours - 4 hours spent diagnosing the problem, 18 hours waiting on a new server, 8 hours setup time, 4 hours originally scheduled upgrade and maintenance, and last but not least, 2 hours of intermittent sleep. Sorry to keep you waiting, but that’s how it was.
Long version: During the scheduled maintenance and upgrade around 2am Tuesday morning, I discovered some hard disk problems with our database server. Soon thereafter, I called our hosting company’s tech support telling them of the problems with our server. In the past, they’ve always been quick to respond and very competent. In the early stages of this issue, that was not the case. Instead of minutes, each response took about and hour and when they did come I was told that it was probably something on my end and to try and find the solution via Google. To be fair to them, it was something really weird and uncommon, but my gut reaction was telling me the hard drive was nearing the end of its life. Even though we use enterprise class, long-life hard drives that’s only been online for about 2 years, I knew that anything was possible with sensitive electronics.
So after the tech gave up on my ticket which was about 5am EST, I spent another hour trying to verify if the drive was dying…and I did. At this point I was a little annoyed that I had to prove this fact myself. Anyway, new information was sent back to hosting company along with a request for a senior technician. Senior tech agreed that hard drive was on its way out and we scheduled a replacement for earliest possible time which was 5pm EST Tuesday. I wonder if the problem was caught earlier, I could’ve caught the 5am maintenance window instead of waiting 12 hours. The actual installation of a new server took another 6 hours. So we couldn’t even begin setting up until 11pm Tues night. By the way, instead of waiting around doing nothing, I spent those 18 hours researching potential new hosting providers and solutions.
Our sister site Girlybubble came online at about 5 or 6 am, (my memory was getting a bit hazy at that point). Ningin also came online in limited mode at that time where you couldn’t submit, comment, or anything that required updates to the database. Another 6 hours later and a round of bug fixes introduced by new server configurations, and we’re back in the present!
I wouldn’t say all the bugs are gone, but they shouldn’t affect you too much. If you do find any, we’ll send you something special if you report them.