Keenan

Developer
  • Content count

    1906
  • Joined

  • Last visited

  • Days Won

    24

Everything posted by Keenan

  1. The lands of Wurm are large, larger now than they ever were before. I've traveled to many islands across the map, and each one seems to be unique in its own way. The Wurmpedia Team is asking all of you to tell us a little bit about your home in Wurm. Where do you live in these lands, and why? Is it the mountains or wildlife? Do the people there keep you coming back, or do you prefer a more solitary life? If you could give one or two sentences to a new player that describes the best things about your home, what would they be? This is open to all servers, but for Chaos and Elevation, I would ask that you think outside of the warfare that is made in those lands, and think about the majesty or beauty that the lands hold. Try to remember that this isn't going to be for recruitment purposes, but to help describe the landscape and qualities of the entire server. The scope of this project is to crowd source a paragraph or two for each server, so be as descriptive as you want and don't worry about repeating what others have said. I had posted this before in the Wurmpedia section, but I hope that this reaches more of you. Feel free to click on that link and see what has already been said. We will be using replies from both threads when we tailor the paragraphs. Thank you!
  2. Hi Everyone, As previously promised, I've taken the time to write a postmortem of the stability issues Independence has had, along with our future plans for server hosting. Independence began to lag considerably some time before February 7th. We had done a maintenance restart as scheduled and had hoped this would fix the issue. It did not. Later in the day we restarted only Independence in an attempt to fix the lag. During this restart I rebooted the server that Independence runs on and upgraded packages. This didn’t resolve the lag either, which I was beginning to suspect was hardware-related. By February 8th, we had a fairly good idea that a drive in the RAID was failing, so we began to move Independence to spare hardware. That hardware was our old Bridges test server, for those interested. It wasn’t new by any means, but it did have fewer cycles on it and the drives were fresher. Independence lived here for about two weeks while we worked on restoring the previous hardware. In the end. Hetzner replaced the failed drive as well as the entire hardware, leaving just one of the older drives with all the data. I restored the RAID and we upgraded the operating system as well as all packages. I had done this on a test server already, but the intention was to make Independence the first server to get this treatment in quite some time. You may have recalled that we were planning a very long downtime in the future. This was to do the same to all other servers and get things back up to date. Well, more on that in a bit. In the end, Independence is still experiencing lag and we’re quite aware of it. I believe this to be hardware related once again, and we will monitor as it continues. The sad part is that all of these issues overshadowed the fine work Samool did to reduce lag across all servers. We can move Independence back to the spare hardware should it become needed, though I am trying to isolate the problem. If it becomes unplayable though, we’ll do the move. While Independence was taking up all my time, Xanadu wanted some attention as well. You may recall a few crashes experienced. Well, these were long-standing and known crashes that we were unable to trace before. Thanks to the scrutiny and diagnostics tools we’ve had running to single out lag hot spots, we were able to trace the crashes back and fix them. Finally. One of these issues actually took Celebration down back in January. All of this was completely unrelated to the issues Independence was facing, and yet it made our stability look pretty awful. Budda and I were working on solutions in the background, not just for our stability issues but for a number of other problems as well. Hetzner has not been the most reliable host for many here, with network slowdowns and hiccups. Even router outages and emergency work that left us helpless as people were unable to play. Working with Rolf, we developed a plan to move from Hetzner and onto a more stable infrastructure with Amazon Web Services. This move is in its early stages. I am in the process of writing the code for the infrastructure and I am planning on standing up our test instances there first. If all goes well, I can begin writing the live server infrastructure and we can write up a future plan on the move and the required downtime to make this happen. This is what I meant by that extended downtime in the future - instead of patching up old hardware, we will be moving to new instances in a reliable cloud environment. For those concerned, we plan on using the Frankfurt, EU location so the servers won’t be “moving” all that far. I’ve had a lot of experience with AWS and I am very excited for what this means. While we maintain backups right now, all of this will become more secure and easier to manage. We can allocate more resources to a specific server if it becomes needed, or scale back and save money if a server becomes less populated. It means flexibility and stability for Wurm now and into the future, especially with the option to purchase reserved resources. I’m excited and I ask for everyone to have patience while we work through this transition. If anyone is curious, I can detail the infrastructure a bit once I’ve ironed out the details. Until then, happy Wurming!
  3. Devblog: Server Issues Postmortem & Future

    This thing has more evolutions than an Eevee.
  4. Devblog: Server Issues Postmortem & Future

    Update! It always happens like this. You think you're ready for something and then you have a think. Suddenly you realize you missed something very important. For me, I missed the ability to establish DNS records for each instance. This lead me to a solution for a problem I inaccurately said was impossible, however. I figured out what I was doing wrong when trying to assign multiple public IPs to a single network adapter. This basically means we can have a DNS record and IP address for each server, no matter how we're hosting it at a given time. It also means we'll be using a new domain name and we'll have fewer DNS resolution issues. I'm currently troubleshooting some connectivity issues, but I hope to have this resolved today. Then it's back on getting the server starting up and working on the database updates needed to populate the IP addresses. The good news there is the update only ever has to happen if I need to delete and recreate the network stack. That stack is so simple though that I should never have to do that. Either way, I'll have a command that does the update in the event it is needed. As for the game server, I was working on getting Spotify's maven-dockerfile plugin working. This way I can generate and push a versioned docker instance of the server on build. The way I'm going about this should see a separate repository for Wurm Live and Wurm Test. I'm aiming to have everything separate down the line. Right now we use a single maven repository which makes things a little more complicated. We can use RELEASE and LATEST, but those are deprecated. I'd also prefer to be a little more specific with what version we're pushing to the live servers. Finally, I'd like this process to be something we can kick off from a web interface - such as jenkins. These are the challenges I'm working to overcome. I'll give more updates as I make more progress. I know this is dragging on longer than I had hoped as I made some quick progress out the gate.
  5. Devblog: Server Issues Postmortem & Future

    Testing is a sore spot with me! We need much more of it, but right now our best testing comes from a single person who is amazing at it. Still, we have issues that simply fail to show up until the code hits live. What's exciting about this is how I will be able to take a snapshot of a live server EBS volume and stand up a test server mirror in a few minutes.
  6. Devblog: Server Issues Postmortem & Future

    If I recall, I think it *tries* to do this but there's still something blocking the main game loop on write. I'll have to poke at it again.
  7. Devblog: Server Issues Postmortem & Future

    So it's been a bit since I've last posted. I've spent a lot of time trying to sort out how to get our build artifacts out of the Nexus repository we have without doing janky things. It's looking like my best bet is a maven project to handle building the server docker instance. I also borrowed some time from this project to move Independence again as well as do my part for the WU Beta. As for Sklo's last post, I'm well aware of the performance issues surrounding EBS volumes. I'm just not entirely sure how they'll affect us until we get some testing done on them. I'm doing all my testing in my own personal AWS playground as to not commit us to anything just yet. If things start looking poorly and I can't make it work without serious performance hits, then we'll adjust our strategy and find a new path forward. I will again say that Wurm's servers depend more on write time than read time, and most of that write time is split between map saves and the database. MySQL already handles buffering writes fairly well, so we could work around disk write latency by doing something similar with map saves - or even offloading it to a new thread. I think that was intended at some point, but if memory servers - it's still tied to the main server loop. Correct me of I'm wrong, but be nice about it! It's been a over year since I last looked at that code. More updates as I have them.
  8. WU 1.8.0.3 - SteamID Table Problem

    Well that was quick. Thanks to Cuddles up there for their complement and the tip. I apparently missed adding the load calls when I shamelessly copied the IP History code for use with SteamIDs. Yes @Batta- this could affect pardons. We've got a few other bugs we're looking at and will aim for a beta version bump this weekend with this fix and any others we can crank out.
  9. WU 1.8.0.3 - SteamID Table Problem

    I did test the command when I implemented everything, so I'd need some data to work with to see why it's not working. Heck, I tested all of this. So I'm extremely interested in what's going on and will do a post when I get to the bottom of it.
  10. WU 1.8.0.3 - SteamID Table Problem

    I missed this when originally reported. I'll work to fix this before the beta goes live. Thanks for the complement!
  11. Devblog: Server Issues Postmortem & Future

    It's been a bit, so time for an update! I'm still working out the details of getting our server code on the instances in a sane way. I've also been swamped at work so far, but it is only Tuesday. With the downtime today, Independence is back on the spare hardware. It's definitely yet another drive issue that was causing the lag. At this point, Indy will simply exist on the spare hardware until we move hosts. In the mean time I'll be bringing the situation up with Hetzner in case we need the hardware for another emergency. (Well, I'll ask Rolf to!) I will say that it's really cool to see how all this works. I've always been fascinated at just how much you create with a few lines of script when doing this kind of thing. The fact that I've broken it apart enough so I can just drop a logical game server (i.e. Release) onto an instance and it sets up all resources is pretty nice. I wasn't entirely sure if I could abstract it out like that. So the steps remaining (TESTING happening between each obviously!): 1) Deploy the proper build of the wurm server automatically based on cluster (i.e. latest snapshot for test, or latest release for live) 2) Script EBS snapshots for backup purposes 3) Update the database with the proper public and private IPs, since we allocate these when an instance starts. > --- < This is where we can begin testing servers on AWS > --- < 4) Handle Golden Valley - we'll either need to move the DNS hosting to AWS so I can update it with the current public IP for GV or I'll have to find some other way to ensure the DNS points to the login server. 5) Work with Taufiq on moving the shop to AWS 6) Websites: Wiki, Forum, Www
  12. Named recipes

    Thats... what I just said...
  13. Named recipes

    Wonka hit it on the head. It was a project I was working on - to get all those affected so we can work on some sort of compensation. The good news is, if AWS pans out, it'll be easier for me to do it.
  14. Devblog: Server Issues Postmortem & Future

    I've crunched the performance numbers on the EBS volumes and of course they're lower than SSD hardware. We plan on doing a stress test to see what the impact of that is and what we can do to mitigate it. Wurm doesn't rely on reads as much as writes, and writes can be optimized in other ways. The only heavy read time on a Wurm server is during initial load. This is why Xanadu takes so long - it literally loads nearly everything into memory except offline players, their inventories, and their tamed animals. We will certainly be putting proper limits on the account.
  15. Devblog: Server Issues Postmortem & Future

    We tie balloons to them. This is how cloud works.
  16. Devblog: Server Issues Postmortem & Future

    I'm hoping to have test servers on AWS by the end of the week. I've done this so many times now, it's kind of second nature for me. We'll be doing a test with live server data as well. I'm hoping we'll be giving a firm date on the move by the end of two weeks, if not before.
  17. Devblog: Server Issues Postmortem & Future

    Time for another update! I've been working more on provisioning, which is now right down to standing up Percona MySQL for each game server paired off on an instance. There's a few hitches to work through, but soon I'll be working on getting the Wurm code deployed. The next step after that will be deploying test servers and having a go with that. For the technically inclined, the way I'm handling isolating multiple instances of MySQL and Wurm is through the use of Docker. I've not worked out the logistics of Wurm's docker container yet, but the MySQL one is working perfectly. I've also broken the stacks out into the EBS volumes, the network, and the instances. This makes it easier to make a change to a specific stack without bothering the rest of them. That's all for now.
  18. Devblog: Server Issues Postmortem & Future

    While we are committed to this AWS move, I'm never against being convinced otherwise. I just didn't find his arguments convincing as my own experience proves his comments wrong. I know that AWS has become one of those jargon words that people throw around to make something seem more important. It's kind of like "blockchain" and "NoSQL". However, each of those terms mentioned are tools that serve a purpose in the right context. Sklo was arguing that Wurm's server needs are not in the right context for AWS, and I stand firm in saying they are. As for his facts, I believe I previously mentioned that they appear outdated. I have to agree with wipeout above that they seem to come from a place of having read some articles rather than actually using AWS. I actually recall stating the same facts three years ago when my previous company originally wanted to move into AWS. Two years later, the facts were looking different and the move was made.
  19. Devblog: Server Issues Postmortem & Future

    I've got a lot of experience with AWS security. I'll be ensuring that everything is locked-down as hard as possible. One thing I like about using CloudFormation is you can set up IAM policies - which means you can specify things like "only allow connections from this IP" which isn't even known until after the IP is assigned at that moment. Security really isn't hard. It's when people "cheat" and think it'll be "okay" that things fall down. Specifically regarding buckets - I would *never* make them public.
  20. Devblog: Server Issues Postmortem & Future

    !! MORE Nerd Talk !! So I've managed to get a server for our test servers up and running and using a framework I wrote. from troposphere import Template from clusters.test.Servers import Oracle, Druska, Baphomet from wurm.server.GameServerInstance import GameServerInstance from wurm.networking.Network import Network from wurm.networking.PublicIngressRules import PublicSshIngress, PublicHttpIngress, PublicHttpsIngress t = Template() net = Network(t, availability_zone='us-east-1a', title_prefix='TestCluster') net.add_vpc('192.168.0.0/16') net.add_subnet('192.168.56.0/24') net.add_internet_gateway() net.add_route_table() net.add_route("0.0.0.0/0") net.add_security_group(title='DefaultSecurityGroup', description='Default ports', rules=[PublicSshIngress, PublicHttpIngress, PublicHttpsIngress]) instance = GameServerInstance(t, net, availability_zone='us-east-1a') instance.image_id = 'ami-0f9e7e8867f55fd8e' # Debian Stretch instance.instance_type = 't3.medium' instance.ssh_key_name = 'wantsmore-coffee-us-east-1' instance.game_servers = [Oracle, Baphomet, Druska] instance.add_instance() f = open("test-instance.json", "w") f.write(t.to_json()) f.close() It may not look like much, but that's because I've abstracted much of it behind "GameServerInstance" and "Network". There's a bit more to do, such as move security groups out of Network so that I can spin up the VPC (essentially the entire network) in it's own stack. Then the instances themselves will spin up in their third and final stack. The first stack contains the volumes for each server, but that isn't shown here for brevity. This is all what I've done in about 36 hours of plugging away. The next step is to get provisioning going - I want to be able to provision new servers automatically. Then I can transfer a snapshot of all test servers over to AWS and see what we've got. The provisioning is really the hard part as I'll be doing some newer things - such as pulling the server artifacts from maven instead of manually uploading them or building them before launch. I've been meaning to do this for literally years now, so what better time than now! Ideally, this will lead to shorter update downtime and less human error. And before the techno-critics rip this apart - it's not final! This is a rushed test script that generates a proper template that I've validated in AWS. Oh and if anyone wants to look at the template file the last run generated, it's here: https://pastebin.com/8QTxnXTH Don't worry, there's nothing sensitive in it.
  21. Devblog: Server Issues Postmortem & Future

    In my real world experience, there's more use cases. For example, highly available services (which Wurm is) and data redundancy and security (encrypted volumes and snapshots, both of which we will be using). One thing you're entirely failing to consider is the abstraction of hardware. Not having to maintain hardware is a huge benefit. While you are correct, that cloud shines more with micro services or scaling resources, it's not the only use case. Anyway, this has been a fun tit-for-tat, however I'll be getting back to things now. To each their own, and I wish you well.
  22. New Year, New Map Dumps 2019!

    They got busy with other things. It's been three months now though... anything we generate may not line up with the original dumps. I'll see what we can do about a new set of dumps soon, complete with routes.
  23. Devblog: Server Issues Postmortem & Future

    I've worked hands-on with AWS for well over a year, with a good chunk of that time being part of a two-person team responsible for transitioning a company's entire infrastructure up with minimal downtime. I've never experienced the things you've mentioned in AWS, but I have been blocked out or lagged out of the Wurm servers on a number of occasions thanks to Hetzner's lousy routing. Time will tell as you say, but there are plenty of options within AWS. My current full-time job uses AWS exclusively for the project I'm on - which requires high availability and performance. There's a plethora of companies that trust AWS with their livelihood, so I'm not entirely sure how it can be as bad as you say. Perhaps you've not had much experience with them? The deployment of cloud resources will not be slapped together and hacked into a Jenkins build. That's a terrible way to do it. That sounds like you're proposing AWS-CLI calls from a central server to spin up resources? No. If you mean the deployment of the code, that's not even what I was discussing in the previous post. So you'd likely be mistaken. Code is already built and deployed from Jenkins to test servers. I don't trust live deployments to Jenkins, but they'll be much more simple as a result of the work I'm doing here. As for "hire a server administrator who has a good bunch of knowledge"... I'll send you my resume if you want. I can't tell if that was an intentional slight or not. Oh and... I'll never, ever run Wurm servers on Windows. All that wasted overhead? Gah, it makes me hurt just thinking about it.
  24. Devblog: Server Issues Postmortem & Future

    !!Nerd Warning!! The following content may not be suitable for all audiences. So I can share a little of what my first step here is. I'm a very big fan of infrastructure as code. The idea of being able to define whole server farms in code and commit that to a repository... well, it rivals coffee. That's why I'm using CloudFormation and a Python library called troposphere. I've played around with some other solutions in the past, but I've had the most experience with CloudFormation and recently troposphere and Python in general. To those not in the know trying to follow along: CloudFormation is an AWS-provided way of spinning up cloud resources (machines, networks, etc) by providing templates. Troposphere is a way to generate those templates from Python code. This is allowing me to abstract out things like the actual game servers. The reason for this is that it may actually be more cost-effective to have one or more servers share one larger machine instance rather than spin up two smaller ones. It also allows us to more easily move "game servers" around as needed. Since we'll be trying to fine-tune what our exact needs are, the ability for me to just "pick up" Xanadu and plop it on a beefier machine with a few edits and a button push is quite appealing to me. In the last day or so I've gotten my code and layers to the point where I've successfully spun up a small test cluster. This is exciting! I've got some more work to do on this, but I hope to be working on the provisioning scripts before the weekend is out. I still have some decisions to make on how I'll handle that, but the main goal there is going to be stability and ease of use. If we ever want to do something like Challenge again, or any other limited-time specialty live or test server, I'd rather it be something as easy as a button push to do it. Tl;Dr: I did something cool with clouds and parseltongue.