Keenan

Developer
  • Content count

    1920
  • Joined

  • Last visited

  • Days Won

    25

Keenan last won the day on March 24

Keenan had the most liked content!

Community Reputation

2610 Rare

About Keenan

  • Rank
    Villager
  • Birthday 07/02/1981

Profile Information

  • Gender
    Male
  • Location
    (final String _locale)

Accounts

  • Chaos
    Keenan
  • Exodus
    Xorith

Recent Profile Visitors

5151 profile views
  1. Poll to gauge Player perception of the GM team

    I think it's fair to say that most of the time the GM team is spot on and damn good at their job. I'll be the first to admit that the GMs have far more limited tools than we developers would like them to have, and they do their best within the scope of those tools. You don't need to go further than a WU server to know this is the truth, and it would take a full-time developer a considerable amount of time to make the situation better. True that there are some tools we've held back from WU, but the majority are there. We have done some things, such as a log for deleted items. Data storage is constantly a concern though (for now), so we can't simply log every action everyone does and thus have the ability to completely undo every action. Just some developer insight. I've been honored to work alongside the GMs on a number of cases over the years due to my position as Game Server Administrator. My vote was "no opinion" as I'm staff and wouldn't want to add a vote that'd be considered biased. Edit: And just to be clear, my comment about the tools is meant to highlight their skill in dealing with day to day situations.
  2. Cant log in on laptop

    Brash_Endeavors is awesome.
  3. Cant log in on laptop

    Looks like a change to deployment intended for test made it to live. Those who have followed the instructions may have to do so again, but others should see the issue resolved. Sorry about that.
  4. Devblog: The Rest of 2019

    Dysentery for everyone!
  5. Devblog: Server Issues Postmortem & Future

    This is one of my pet projects that I do hope to complete. Since I have to do it all manually right now, I'll more than likely automate the system at some point. I'm not sure if it'll be live map as we do support events like mazes and such, but something that lets us put out a more scheduled dump of the maps would be amazing.
  6. Have Sindusk take over Epic

    Addressed:
  7. Devblog: Server Issues Postmortem & Future

    Hi all. It's been a while since my last update here. I'm going to start by taking this quote from another thread: Let me break this apart into the main bits: No cloud optimization for Wurm This is true in general. Wurm cannot scale in the sense that we can't spin up say two or three instances to help Xanadu cope with lag. Yet this is not the only solution cloud has to offer. I'm mainly looking for the stability that comes with hosting on AWS from a network perspective as well as the ability to build in even more safeguards against data loss. It also means faster recovery in the event of a server outage. Hetzner doesn't give us a whole lot of options in that regard and the network has been abysmal for quite some time. The Costs The costs are a huge burden on me, as a matter of fact. One thing I did when talking to Rolf about this was show that, if done right, we can meet or beat the current hosting costs. This is primarily why it has been taking me so long - I'm trying to do more heavy lifting with code than infrastructure. It would be easier for me to shove things into the more expensive offerings, but it would be bad for Wurm as a whole to incur such a high cost in comparison. We are also looking into the three-year pricing to save even more money and to ensure that Wurm will be around for quite some time. The Silence It hasn't quite been three months, but I can explain the silence. First off, part of that has been me taking a leave. I got things to a point where I can start getting a test cluster stood up and we've had a successful connection test with my infrastructure in place. Since that time my focus was on Wurm Unlimited as well as some personal things that had come up. I hate to use this as a shield, but keep in mind that I work on Wurm in addition to a full-time job. I needed some time off from everything so I could come back to this fresh. My day job had me working deeply with AWS as well, and I've actually learned some new tricks that might help me with Wurm's infrastructure. I need to spend a little time with that and see if it'll be a better way than how I've been doing it. Going Forward My time over the next two weeks will be scarce, but starting in mid-May, I plan on diving back into this in all my off-time again. My current road map for this looks like: Server auto-deployment code. (We currently do a manual deploy to Test for the server) Data backup and restore from S3. (This will allow me to clone a server during downtime) Using the above, stand up clones of all three test servers in the new Wurm AWS account. Deploy a special test-client that connects to them After this, it will be a period of observation and hopefully some stress testing from all of you. I'll work with Retrograde and Budda on some kind of event with rewards, but the main problem with that is it may require several attempts. I'll mainly need to push the server to it's limits and see where any bottlenecks are. I welcome questions, comments, and criticism equally.
  8. Devblog: Server Issues Postmortem & Future

    Will do! Update, Pt 2 So I couldn't put this down today at all. It's been about 13 hours total with an hour break for dinner. One of those days. And yet as I type this, I have all three test servers running in a sandbox and talking to each other. Samool was kind enough to give a test connection and it worked. This doesn't mean it's ready for you folks yet! I still need to move them to their final home. I managed to get the database updates for IPs and ports automated and I've got a path forward on auto-restarting the server for updates and recovery. A simple daemon will suffice. No, not demon. Hamsters are enough trouble. A daemon basically something that runs in the background. In this case, the daemon will watch to make sure all Wurm docker instances are operational. Since the docker instance goes away upon termination, if a server crashes the daemon will know. With a proper configuration, it'll know which server and can start it back up. At the same time, I can tell it to pull down the latest image - and I plan on using a repository for test and one for live, so that there's never a chance of test's code getting on live accidentally. Live will be push-button whereas test will be a continuous deployment pipeline up until the act of shutting down the servers. That'll still be manual on both sides. I've decided that keeping static IPs is actually for the best as well, yet I've written everything with the possibility of using ports instead. What that means is that if more than one game server shares a host for cost efficiency, they either need separate IPs or they need separate ports. I prefer the IP method, but ports are an option as well. The reason I'm okay with this is because of the EBS volume that stores all server data. This is something that can't be attached during an update either, so basically if the instances will need to be replaced then I'll have to delete the stack and recreate anyway. The stack will likely take 15-20 minutes to create, so honestly it's not a huge amount of downtime. If I'm doing something that requires it, then we'll just do an "extended 1-hour downtime". Finally, now that I'm this far, we can soon start testing for I/O and I can start building the live cluster profile. I'd still like to devise a way to auto-import the live data, but if the choice is to spend hours doing it manually once or spend days getting an auto-import right? We'll do the hours. I don't want this held up on some fancy thing that I'll probably use once. Once this is all done, I'll be turning my gaze at the shop, GV, and our build infrastructure. After that will be forums, WO and WU web, and finally Wurmpedia. That last one, I would like to take the time to address some requests that @sEeDliNgShas made, so it may take some time. I'd also like a sandbox for her to play in, since who doesn't like sandboxes?
  9. Independence Went Ka-Boom

    Working on it
  10. Devblog: Server Issues Postmortem & Future

    I feel I should add a little about the I/O solutions here. We're fully willing to make changes to the Wurm server to compensate for I/O issues. I'm less worried about the database and more worried about map saves as I mentioned. If I find that map saves are a problem and we can't work around it with code, then another option is to use either a provisioned drive or an instance with an attached NVMe. The latter is literally an SSD attached and the I/O speeds are on par with direct hardware. The main issue with an attached NVMe is that its ephemeral, and thus I'd have to copy the map to it before start and schedule regular copies to the EBS volume to ensure it's constantly backed up. Another issue is that the cost per instance goes up with that option, and I am being cost-aware here. We're willing to pay for the benefits, but the more frugal I am the better. Obviously! The provisioned drive also cost more, and basically what it does is allow for faster I/O speeds at a cost. That will be a bit more complicated as I'd have to do some benchmarks to see where our IOPS need to be. You essentially say "I need this much i/o per second" and pay for it. If you don't use it, you still pay for it.
  11. Devblog: Server Issues Postmortem & Future

    That's the intention! Status Update: Every time I touch this, there seems to be more work to do. That's okay though, I'm plugging along. Docker! Wurm's server build configuration pushes a docker image of the server to a repository. This means that every build will update the image's "latest" tag and give us a fallback point should we need it. Think of this as snapshots of the running build. This with volume snapshots means that if things go horribly wrong in an update, we can very easily set things back to where they were before the update happened. No one likes the world "rollback", but at least it'd be less painful than it currently is if we need it. Lately we've dealt with just handling fixing what went wrong. The poor GM team has had the burden of that, but I'd much rather lose 15 minutes of progress for the few who have connected than spend two weeks trying to catch everyone affected and fix their issues. More Docker! I've successfully ran our Oracle test server in a docker instance. It's using a docker instance for MySQL as well as an EBS volume for the map and logs. This is precisely what I've been after, but there's still some manual configuration that I need to script so that this happens automagically when a "stack" is started. I want to do as little manually as possible as human error seeps in. Plus I'm lazy, okay? Gosh. No, seriously - it's about human error. Downsides So far there's some downsides that I need to mitigate. For one, I want the servers to auto-recover. What happened to Indy yesterday really shouldn't happen in this new environment, so I need to solve that problem. At the very least, I want to make it so there's a number of people who can press a button to recover a server. We obviously want direct access to be restricted and not needed for basic things. I was hoping to use an Auto Scaling Group for this, but I had forgotten how restrictive those are when it comes to configuration, and I'd much prefer the network configuration I have over a mechanic that may not even work well for us. I mean the idea of it killing a server because of a failed health check makes me worry, so the idea is dead there. Another downside is that I'm using static private IPs. It was a way to make things work, but I really want to get them to be dynamic. The reason for this is so I can do stack updates instead of delete and recreate. The latter takes considerably more time. I want to minimize downtime for things like OS updates and such. Finally, there's the point Sklo has brought up a number of times. We need to slam the I/O and see what's going to happen. Samool has suggested that we basically work with thousands of items at a time. Given that item updates are one of the most costly things in Wurm, I think that might work for the database. Yet we'll need to also test map updates, so perhaps we can find a way to get a good number of you on this server once it's up and start digging holes! I'm not sure what we can do to reward such testers, but I'll bring it up with Retrograde. I know I'd prefer a good hundred people or so, alts or otherwise. Just enough to give a good live-server-ish test. Going Forward The plan now is to finish converting the manual configuration to automatic and then move the test servers over fully. I also need to get the logs into CloudWatch so we can set up proper alarms when things go wrong and give access to high level staff members to look through them when needed. I'd also like to get some monitoring going in CloudWatch, so we can tell when an instance is over-burdened and may need to be bumped. There's also the moving of our build server, which I've not even started yet - though that can wait until after everything else is moved. Finally, there's the special cases around Golden Valley - including the shop. That's all for now.
  12. Independence Went Ka-Boom

    We're prepared to make changes to the server to increase performance in the cloud if needed, which will end up in the WU code as well. So in a way, this will be a win for WU too.
  13. Independence Went Ka-Boom

    Not intentionally rolled back, but there may have been some data that wasn't flushed to disk. The entire system locked up and went unresponsive. Support tickets and we'll handle things on a case-by-case basis. Or I should say @Enki and his team will. O:) But it sounds like you harvested, and then it wasn't harvested? Wouldn't that mean more items? The tiles wouldn't reset - they'd simply not have a state saved around the time Indy went down. So anything before that wasn't a result of the downtime. It was about 1am EDT when it went down.
  14. Devblog: Server Issues Postmortem & Future

    This thing has more evolutions than an Eevee.
  15. Devblog: Server Issues Postmortem & Future

    Update! It always happens like this. You think you're ready for something and then you have a think. Suddenly you realize you missed something very important. For me, I missed the ability to establish DNS records for each instance. This lead me to a solution for a problem I inaccurately said was impossible, however. I figured out what I was doing wrong when trying to assign multiple public IPs to a single network adapter. This basically means we can have a DNS record and IP address for each server, no matter how we're hosting it at a given time. It also means we'll be using a new domain name and we'll have fewer DNS resolution issues. I'm currently troubleshooting some connectivity issues, but I hope to have this resolved today. Then it's back on getting the server starting up and working on the database updates needed to populate the IP addresses. The good news there is the update only ever has to happen if I need to delete and recreate the network stack. That stack is so simple though that I should never have to do that. Either way, I'll have a command that does the update in the event it is needed. As for the game server, I was working on getting Spotify's maven-dockerfile plugin working. This way I can generate and push a versioned docker instance of the server on build. The way I'm going about this should see a separate repository for Wurm Live and Wurm Test. I'm aiming to have everything separate down the line. Right now we use a single maven repository which makes things a little more complicated. We can use RELEASE and LATEST, but those are deprecated. I'd also prefer to be a little more specific with what version we're pushing to the live servers. Finally, I'd like this process to be something we can kick off from a web interface - such as jenkins. These are the challenges I'm working to overcome. I'll give more updates as I make more progress. I know this is dragging on longer than I had hoped as I made some quick progress out the gate.