Sign in to follow this  
Pandalet

Post-mortem: server outage Sunday 7 August 2022

Recommended Posts

The server outage this morning was due to a hardware failure on one of the physical machines hosting our game servers.

 

The initial failure happened around 9:45 UTC, causing the machine to shut down.  The actual failure was a hard disk failing.  This took out Pristine, Indy, Serenity, Desertion, Exodus, Deli, and Chaos, but also Golden Valley and the webshop, meaning that even if you were on a server that wasn't down, nobody could log in (GV is our login server).  We notified our hosting provider (who were already swinging into action), who pulled the machine and replaced the disk.  Replacing the hardware didn't take very long, but rebuilding the images took a little bit, so we had to wait for the machine to give the green light before bringing anything back up. 

 

We do not expect any significant data loss as a result of this outage, as we use a redundant array configuration.

 

The game servers were brought back up at 11:57 UTC, with all hamsters operating normally from that point.

 

Fun fact: Max handled sorting out the systems from our side, on his laptop, from his allotment.  No vegetables were harmed due to this outage.

  • Like 26

Share this post


Link to post
Share on other sites

@Burdok, I fed them some of my freshly picked courgettes to keep them going 😀

  • Like 15

Share this post


Link to post
Share on other sites

Can we replace the server hamsters with capybara?

  • Like 5

Share this post


Link to post
Share on other sites

Well done to all involved in sorting out the problem.

 

Thank you also for a very rapid insight into what happened- over and beyond what I would expect on a lazy summer afternoon!

  • Like 2

Share this post


Link to post
Share on other sites

staff always been pretty quick to solved issues thanks for still being a great team

  • Like 2

Share this post


Link to post
Share on other sites

Well said, kudos to those behind the scenes!!  We appreciate YOU!!

  • Like 2

Share this post


Link to post
Share on other sites

Is this also responsible for the lag spikes experienced within the last 30 days?

Share this post


Link to post
Share on other sites
52 minutes ago, kaidley said:

Is this also responsible for the lag spikes experienced within the last 30 days?

 

It's remotely possible, but it's not likely.  Having said that, lag is something we do keep an eye on, and we're constantly looking to improve.

  • Like 1

Share this post


Link to post
Share on other sites

+1 for capybaras.  They might not have the wheel running any smoother, but they are so cute we can forgive them anything.

  • Like 2

Share this post


Link to post
Share on other sites

Tell him to try growing purple carrots and purple podded peas :P

Also Indian rainbow corn, if he's bored with the usual stuff...

Share this post


Link to post
Share on other sites
2 hours ago, Atheline said:

+1 for capybaras. 

Theres so much disagreement in the world yet almost all of humanity seems collectively united on the fact Capybaras are cute and interesting.

  • Like 2

Share this post


Link to post
Share on other sites

I heard that if you upgrade to a fancy angora hamster then you can expect a 10% performance improvement. They also improve the social well being of the other hamsters.

Share this post


Link to post
Share on other sites

I appreciate the quick reaction and info. Thanks a lot the staff! ❤️

Also:  +1 for capybara cuteness.

Share this post


Link to post
Share on other sites

Great job and it's really cool to actually know what caused the issue to.

 

Thanks Wurm Team! ❤️

Share this post


Link to post
Share on other sites

Wasn't the avoidance hardware failures why Wurm has been moved from Hetzner to Linode?

With Hetzner if a server failed one island was down, now with Linode one hardware failures causing the outage of multiple islands at once, due to their virtualization and/or containerization strategy.

Edited by Sklo:D

Share this post


Link to post
Share on other sites

Thanks for the quality time I had with my daughter that day :P. GReat work on getting everything backup!

Share this post


Link to post
Share on other sites
On 8/7/2022 at 9:38 AM, maximusi said:

@Burdok, I fed them some of my freshly picked courgettes to keep them going 😀

Weird, I keep getting ads about hot ones being in my area

 

edit: Nvm definately not the same thing

Edited by Yggdrasil

Share this post


Link to post
Share on other sites
On 8/7/2022 at 10:34 AM, Pandalet said:

 

It's remotely possible, but it's not likely.  Having said that, lag is something we do keep an eye on, and we're constantly looking to improve.

 

Lag happens when the hamsters take a bathroom break.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this