Sign in to follow this  
Keenan

Performance Art

Recommended Posts

Hello all,

 

I wanted to take a moment and talk about what we’ve been doing to address the lag and performance issues on some servers, such as Cadence.

 

One way we can see what might be causing lag on a server is to profile it, and we have been doing this mainly with a tool that required some attention to obtain helpful information. In short, we had to catch the lag when it was happening and hope to see the smoking gun. This helped with some issues we’ve already patched, but it wasn’t enough to fix the lag. This profiling is also Java-specific, leaving many potential performance issues hidden from us.

 

Recently we went with a profiling service that runs on each host and allows us to zero in on performance over a specific period and across multiple servers. It also profiles the entire host, which can be filtered down to a specific Wurm server when needed. This helped to shed some light on factors we couldn’t see or weren’t able to notice before.

 

We’ve made several changes to help us reduce lag, and one of those changes happens to be upgrading our server Java to version 17. In addition to this change, we’ve also identified several expensive calls that could be cached or reduced.

 

To explain why the Java version matters, I’ll explain something technical about how Java handles memory. Java handles memory management for us by determining when something is no longer needed and removing it from memory. This is called “garbage collection.” Older versions of Java would pause the application for a short time to perform this task. Newer versions of Java have a much more optimized garbage collection system, and we plan to employ this on all servers with Tuesday’s reset. We found a significant jump in CPU usage concerning garbage collection when profiling.

 

Unfortunately, our ability to test the results of these changes is limited. We can not perform load tests on our test servers, so the actual performance improvements will need to be determined from live servers. This may also mean that bugs could emerge from this upgrade, so please use the forums to report them if they do.

 

You also may be wondering about the client’s Java version. For the client, we are making changes to the launcher to support multiple versions of Java. This work is currently slow, but it is a high priority. We, as well as some community members, have determined that newer versions of Java are more performant and less crash-prone. We also would like to update our version of the graphics library used by Wurm. Several fixes and improvements can be found by doing this as well.

 

Finally, we are still working on the second part of the Exploration update. The development team has been doing a great job balancing the workload put before them. We are committed to improving the Wurm experience, and we thank you for your patience as we try to tackle All The Things.

 

Until next time, happy Wurming!

Keenan

  • Like 46
  • Cat 5

Share this post


Link to post
Share on other sites

This is great to hear, good job to all the dev team for finding the issue and tracking it down!

 

As a side note that may be at least a little relevant I have noticed Cadence tends to have the most people and especially new players seem to gravitate to it because it has specified if it is the "newest server", I was wondering if that message could be removed now as to help spread newer players around the other servers a bit and probably even help new player retention as I have seen many newer players mentioning the difficulty of finding a nice spot to settle there.

  • Like 2

Share this post


Link to post
Share on other sites

I know I will be Cursed and sweared for this, forgive me for my poor knowledge of Java in particular and how the server side code works, but... Could limiting the number of alts per IP, like 3 max, maybe helps in terms of overload of the servers? I realize a lot of peope run several instance of the client, sometimes up to 10 or more, increasing the number of toons in game simultaneously. Not sure if this could help but worth a shot. As a note, I don't want to ruin the game experience of some people, just wondering, I'm not a java guru!! :)

 

Thanks for the attention 

  • Cat 1

Share this post


Link to post
Share on other sites

I was waiting for a song and dance routine ...😉

But sounds good to get lag fixed on all servers ..

 

Share this post


Link to post
Share on other sites

For Cadence - there really seems to be two issues going on.   1.) is a slow increase in lag throughout the week that is usually at it's worst on Monday evening before server maintenance on Tuesday morning (talking US time here).   This seems much better recently.   2.) is revolving around major events with lots of players.   6 weeks ago, it was getting really bad right before a rift or public slaying - more recently it seems good right up until the boss dies, then within a minute or two the lag sets in and catseyes, waypoints and even bridges do not load until too late.  If you need to "watch" it happening - one of these events - even Dev hosted - might make for a great stress test.  

I kind of assumed the first might be a memory leak slowly accumulating over time through the week - then cleared with server maintenance to cycle again.   

  • Like 1

Share this post


Link to post
Share on other sites
6 minutes ago, Enniskillen said:

I know I will be Cursed and sweared for this, forgive me for my poor knowledge of Java in particular and how the server side code works, but... Could limiting the number of alts per IP, like 3 max, maybe helps in terms of overload of the servers? I realize a lot of peope run several instance of the client, sometimes up to 10 or more, increasing the number of toons in game simultaneously. Not sure if this could help but worth a shot. As a note, I don't want to ruin the game experience of some people, just wondering, I'm not a java guru!! :)

 

Thanks for the attention 

 

This is an MMO. It's built to handle many players. You're limiting alt amount based on your own capability on your computer, but Wurm already is more unstable than a table with 2 legs when you go above 9 or so. Limiting how many alts per IP does nothing.

  • Like 3

Share this post


Link to post
Share on other sites

Sleep bonus +5 hrs for the love of us all through these trying times?? 😊😍🥰😀

 

Precedent says yassss :) 

  • Like 2

Share this post


Link to post
Share on other sites
8 minutes ago, Wurmhole said:

gh5otlD.png

 

 

I dunno if this causes lag (wink)

 

 

 

I tried to make it every 3 hours :P Wurm had other ideas.

  • Like 6
  • Cat 2

Share this post


Link to post
Share on other sites
13 minutes ago, CthrekGoru said:

For Cadence - there really seems to be two issues going on.   1.) is a slow increase in lag throughout the week that is usually at it's worst on Monday evening before server maintenance on Tuesday morning (talking US time here).   This seems much better recently.   2.) is revolving around major events with lots of players.   6 weeks ago, it was getting really bad right before a rift or public slaying - more recently it seems good right up until the boss dies, then within a minute or two the lag sets in and catseyes, waypoints and even bridges do not load until too late.  If you need to "watch" it happening - one of these events - even Dev hosted - might make for a great stress test.  

I kind of assumed the first might be a memory leak slowly accumulating over time through the week - then cleared with server maintenance to cycle again.   

 

The first one is something we're aware of, and the reason we do weekly restarts. While we do try to fix the bugs we find, sometimes just having maintenance is the answer.


As for events, I wanted to say a few things about them and what we can do going forward to help address the lag. There's basically lag we know of, which has to do with server crossings, and then there's lag we're still trying to discover. For what we're trying to discover, this new profiling tool will help us zero in on the culprits. We can now cross-reference rifts, or even just lag events on the server, with actual profiling data from not just the server lagging, but all servers. This comparison helps us identify outliers, which helps us spot issues. We lacked this sort of visibility before.

  • Like 5
  • Cat 1

Share this post


Link to post
Share on other sites
2 hours ago, CthrekGoru said:

For Cadence - there really seems to be two issues going on.   1.) is a slow increase in lag throughout the week that is usually at it's worst on Monday evening before server maintenance on Tuesday morning (talking US time here).   This seems much better recently.   2.) is revolving around major events with lots of players.   6 weeks ago, it was getting really bad right before a rift or public slaying - more recently it seems good right up until the boss dies, then within a minute or two the lag sets in and catseyes, waypoints and even bridges do not load until too late.  If you need to "watch" it happening - one of these events - even Dev hosted - might make for a great stress test.  

I kind of assumed the first might be a memory leak slowly accumulating over time through the week - then cleared with server maintenance to cycle again.   

So in regards to the big slaying events - I would like to add my two cents, as I participated in the two most recent dragon kills that were made public.

 

What happens before the hunt is 300 characters gradually migrating from various locations, to accumulate at the event, over the course of an hour or two.  Some come well in advance, but the last hour, probably 2/3 of the characters show up.  Then the hunt happens.  Everyone all local.  However, after the hunt, everyone exits that local at virtually the same time.  Everyone experiences sees the world go through massive rendering lag, chat lag, menu lag.  We were having delayed chat of up to 2 minutes.  I had alts in my knarr that were 2 minutes behind on their rendering.  They appeared to be an entire map grid behind my captain alt. 

 

As characters gradually dispersed, the lag decreased, rendering sped up and alts rejoined the same time line as the captain.  Chat returned to normal.

 

As unlikely as this may seem for a request, I'd suggest any public dragon hunts on NFI be coordinated with Wurm devs, and just maybe they can allocate additional cores/ram temporarily for the event?  I assume wurm is running on a virtual server architecture of some kind, which could easily manage it.  You won't need as many cores on the two servers that don't have the hunt, right?

Share this post


Link to post
Share on other sites
53 minutes ago, Wurmhole said:

As unlikely as this may seem for a request, I'd suggest any public dragon hunts on NFI be coordinated with Wurm devs, and just maybe they can allocate additional cores/ram temporarily for the event?  I assume wurm is running on a virtual server architecture of some kind, which could easily manage it.  You won't need as many cores on the two servers that don't have the hunt, right?

 

We don't have this level of tuning on the servers. We'd have to take them down to make changes like this and coordinate the changes with our hosting company.

 

I have a few ideas of how to fix the underlying problem, however it's not simple to fix.

  • Like 3

Share this post


Link to post
Share on other sites

From user side experience the public unique fight lag that is visible in the client gets real bad the moment the opponent is released from the pen and gets targeted. I have no idea if it is server side or client side lag at that moment...

Share this post


Link to post
Share on other sites
1 hour ago, Jaz said:

From user side experience the public unique fight lag that is visible in the client gets real bad the moment the opponent is released from the pen and gets targeted. I have no idea if it is server side or client side lag at that moment...

And noticeably worse the second it’s death hits chat.

Share this post


Link to post
Share on other sites
2 hours ago, Jaz said:

From user side experience the public unique fight lag that is visible in the client gets real bad the moment the opponent is released from the pen and gets targeted. I have no idea if it is server side or client side lag at that moment...

combat code might be using too much sockets and threads to track players actions, distance, what you do, what enemy does, checks for hp, skill checks, etc.. imagine all that at once for 2-300 people and since most information for things in local gets dumped in combat/event log(or shorter vicinity) we might end up "doing a ddos" as all players force a lot of requests for checks and syncs;

 

that's 1 thing, another is basic travel around the map, it again spikes "life" to local areas and syncs, AI, and other checks(whatever that could be), only heard player presence forces animal AI to "work", so when a player is in local - then creatures move, feel hunger, I can only guess maybe vegetation works in similar matter or not; either way, when many players move around the map at the same time - it lags the server, doors and gates become "walls" because of the lag and need some time to work, server often reminds players to move slower, chat lags, etc

Share this post


Link to post
Share on other sites
12 hours ago, Enniskillen said:

I know I will be Cursed and sweared for this, forgive me for my poor knowledge of Java in particular and how the server side code works, but... Could limiting the number of alts per IP, like 3 max, maybe helps in terms of overload of the servers? I realize a lot of peope run several instance of the client, sometimes up to 10 or more, increasing the number of toons in game simultaneously. Not sure if this could help but worth a shot. As a note, I don't want to ruin the game experience of some people, just wondering, I'm not a java guru!! :)

 

Thanks for the attention 


The problem with limiting by IP address is where you get people like my wife and I who would be using the same IP address (and I'm aware of other couple who play Wurm) as we both play from home.

 

There are times where we both need two toons in world,  my wife's alt is a Fo priestess for example and is used when breeding agressive creatures with her main and I may be doing something on deed upkeep that needs, or at least is easier to do with, two toons in world.
 

  • Like 1

Share this post


Link to post
Share on other sites

So, are you tuning G1GC or are you moving to ZGC or Shenandoah?

 

I've had some confusing experience with ZGC due to the way it allocates RAM causing the OS and my reporting tools to overestimate its RAM use by a factor of 3.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this