The 3 Million Dollar Fiber Line 

As told to Sarah Holm

When the phone rang, I was with Network Administrator Jeremy Burkart, because we carpool to La Crosse together.

I started talking to people in IR, like Systems Architect George Neill and Network Administrator Trevor Laughnan.  They said HQ was a complete loss and I should go to the CROPP Distribution Center in Cashton.

As we’re barreling down that rabbit-trail known as County Highway D, we used the internet on my phone to get an internet connection on Jeremy’s laptop. With that we could log into the VPN (Virtual Private Network) and shut servers down.  We shut the servers down because we were afraid the fire would cause a loss of power and disrupt the cooling system in the data center. A hard stop like that would cause corruption within our servers.  We were able to bring them down gracefully in a slow stop without corruption.

Organic Valley has two data centers in our La Farge headquarters. We had a live video feed of the server room, and we were getting alerts that there was water on the floor in the old data center. We could see all of our equipment and monitor everything up until we lost power in that area from the fire. We were not sure if, when, or how our new data center would be affected as the fire was still burning strong.

By 7 p.m. Tuesday, we were at the Cashton DC looking over our disaster recovery plan. The disaster recovery plan is mostly guidelines. We know what steps we have to take, but specific steps depend on the specific situation. We went through our plan and our different strategies. We have two SANs (Storage Area Network), one in La Farge and one in Cashton. They replicate data back and forth so that if one dies, we can use the other.  We were discussing what to do if the La Farge SAN died.

At some point, we sent Jeremy and Network and Virtualization Administrator, Randall Juenemann, down to headquarters to give us a report. They called me and said, “You’re not going to believe this. We’re in the new data center and everything is up and running.”

Randall

Randall at work

Thanks to Jeremy’s hard work of getting a new data center with cooling and generators a year ago, we still had everything. We had shut down our virtual servers but everything else was running.

That changed our strategy right there.

Jeremy and Randall pulled our Primary Domain Controller (which is kind of like the brain of our operations) and a SQL server  from the new data center just in case the fire should get to that area. That would enable us to bring everything back up a lot quicker in such a situation. We decided to wait until the next day to get fiber into the new data center and to try to recover the network that way.

I was home at about 2 a.m. that Wednesday morning, then got up at 5:30 and drove to La Farge. I was escorted by a firefighter into the old data center. All the lights were flashing on the equipment and it was running but everything was soaked. I powered down a bunch of my switches and took out the main equipment that I was concerned about. I tried to dry it out but all the equipment that was in that old data center is worthless.

I moved on and verified that everything was running in the new data center. I had called at 1 a.m. the previous morning to get Steiger Construction and Vernon Tel to trench our fiber if we should need it. I wanted to meet them there, the sooner the better, to get it rocking.

The guys came out and marked our lines. We realized our fiber access was still there. So we ran a temporary 3 million dollar cable. We call it that with affection, because that’s how much money is lost daily if we don’t have the network up and going. We ran that big bad black cable all the way from the end of the building from the old data center to the new center. I had lost a bunch of my core routing networking equipment, so we had to jerry rig it all.

At 1:04 p.m. Wednesday, we had restored connectivity so Cashton and La Farge and Chaseburg could all talk to each other. At about 4 p.m., I felt good enough about everything to allow everyone to turn their systems back on.  I went outside where everyone was sitting on the lawn waiting for me. I ran out and yelled, “Okay! I’m not the bottleneck anymore!” From there it was awesome, because I got to watch our Database Administrator, Janelle Fellegy, bark out orders and tell everyone what servers to get up. It was great to see everyone working in sync.

Then I sat down, had a drink of water, and ate something.

I relaxed for a while then. I could hear people do little hoots and hollers every now and then when something came up successfully.

IT folks and others Josh post (2)

The best was when somebody yelled out “146 orders!” The whole area just erupted in screams of joy because we didn’t even miss an order. It was just awesome.

I would keep working once I got home. My poor kids came up at one point and said you know we haven’t seen you all week and I said, “Well I’m home now!’ and my daughter answers, “Yeah, but you’re working.”  That really hit me in the heart.

Josh with his son Victor

Josh with his son Victor

Saturday morning people were wondering why I was still so stressed out. I would say, “Well that 3 million dollar line could get severed at any point and then we could lose everything again!”

It could be severed by people tripping on it or the building shifting. I was paranoid. Friday morning I was trying to get the fiber trenched but it was delayed because of concerns over lightning. We worked outside all day in a tent in the rain. We finally got the fiber trenched under the parking lot Friday evening. I got home about 10pm and Vernon Tel was going to come out the next morning to put the ends on the fiber so that we could actually connect to it.

So Saturday morning I was working from home. I was online working and suddenly everything dropped. I had a small heart attack. I was calling people and asking what happened. Trevor flipped the fiber over to the new line and there was only a minutes’ drop of connectivity and then we were back up and going. It was just in the nick of time. We got that fiber set up right before the building shifted and cut our line.

Then I started looking at stuff and finding issues with the VPN. I was being told to take a break and chill out and I was like, tell that to the hundred-some people on Monday who aren’t going to be able to access their files. I worked most of the weekend but tried to take some breaks as my wife was trying to make sure I kept eating and sleeping.

Monday I witnessed firsthand how badly the network was running. There were 190-some people on the VPN at that time and they couldn’t access the I-drive and their files effectively.

Around 5 p.m. Tuesday I did a quick maintenance outage and made a bunch of changes that helped. My crisis mode ended Tuesday night because that’s when I got people able to do their work effectively.

I actually got 6 hours of sleep on Tuesday and felt pretty good.  Since the fire, I had only been getting 2-3 hours a night.

For the most part, business continues on and people can do what they get to do. It is amazing that the network was back up within 18 hours.

My favorite part of it all was working in the rain underneath that tent and seeing us power through everything as a team. For us, it was not a panicked crisis mode but rather a focused business-as-usual attitude. It was a neat experience being able to say that we did that. When I worked as a consultant we used to call situations where a person wasn’t sure how to do something a “trial by fire.” This time it really was “trial by fire” in the literal sense.  And so far, so good.

IT guys for Josh post (2)