In this week's patch notes there was a small line that a lot of people would have glossed over, that being “Fixed a number of movement issues.” There’s a long story behind this one, with a few weeks total of working on it and something completely insane that I had to fix, so I wanted to write a bit about it - if you’ll forgive my rambling. Note: Before we start this I should point out that in the context of this post, "desync" is referring to a creature or player appearing at one location from your view, but from their view they're in a different spot. This isn't related to the "Your position on the server is not updated." message that you can sometimes get when the server itself is lagging.
Hopefully by now some people have noticed a bit smoother movement in general (ignoring today's horses being a bit jittery on bridges), and a lot less (ideally none) movement desync that the game has had troubles with for a long time. These desync issues were magnified with the Fishing update and became a lot more apparent over the last few months. You might have encountered this on your boat by disembarking after a long trip and being a tile or two away from where you thought you were, or by leading a creature a long distance and it slowly getting farther and farther away from you.
The cause of the desync becoming more apparent was a change I did months ago for fish. Each fish in the game is technically a creature the same as any other and the server controls it as such. When you cast your line it decides when your fish will spawn, then tells it to move towards your line where it stops for a bit. Early on we ran into issues where the fish wouldn’t always stop right on your line, but sometimes off to one side or the other. To get into the reason for this issue and how I fixed it we have to dive a bit into how the movement updates in Wurm work.
Movement is calculated on the client based on your input which then sends your new position to the server. Server then checks that data against what you’re allowed and expected to actually do with that movement then if everything is okay it saves that as your new position and updates everything that needs to know where you are. This includes any other players that can see you, all the creatures within range of you, anything you’re riding or dragging or leading and a few other things. When doing these updates they’ve historically been done using position differences. The difference is passed along and calculations are done based on that instead of the exact new position. This is where the problem starts.
For reasons of keeping data packets to all clients as low as possible, all movement difference values got packaged into byte values (which is a quarter of the data that sending the full position value would be), meaning a number in the range of -127 to +127. This means that when you’re moving from [1587.33, 903.19] to [1587.39, 903.81] the X and Y differences have to be converted into that -127 to 127 range, whole numbers only. To accomplish this the differences were multiplied by 10, which had the downside of ignoring any movement difference below 10cm (0.1) since that would get rounded down to 0cm when converted into a whole number byte. Here’s where we get back to the fish problem. Because the final movement of the fish to get to the hook was sometimes less than 10cm the update packet sent to the client was rounded down to 0cm and ignored. After some testing to figure out the actual movement values being sent to the client was never getting beyond the -7 to +7 range, I changed that x10 multiplier into a x100 multiplier. This gave us an extra degree of precision in movement, while keeping the movement difference values inside that -127 to +127 range.
Although this fixed the fishy issue that we were having at the time, it brought to light some new desync issues and made some existing ones worse. At that time the biggest issue was stamina drain problems because of this change - I had thought it was because I missed some x10 -> x100 change at some point, but after going through the related stamina drain code it didn’t make sense. I gave up some point after and tweaked the draining code to be as close to live as it was before the change and chalked it up to something weird that I might find later on. The last few weeks was that ‘later on’ where I found out that my initial thought of “this doesn’t make sense” was correct.
You may have noticed that I glossed over what happened in the old code when those difference values were rounded down to 0cm if they didn’t hit that 10cm (or 1cm after the x100 change) threshold. This is the source of the desync that the game has had in varying degrees since day one. When these values were rounded down to 0cm (or remainders from a whole number were rounded down), the movement would still be applied but the “update everything else” code would see it as no movement and not run. This means there could be some movement ticks where nothing was sent to players that were in range, or stamina would not drain at all if it was a player’s movement. But then how has everything still mostly worked all of this time if these movement differences weren’t updating properly?
Now we get to the meat of the issue. The code that transformed the new position values into a difference value was prone to a rounding error that every now and then would cause a movement difference that is usually under 1cm to get rounded up to 10cm and trigger the chain of updates to everything and stamina drain. When the difference code was working on 10cm intervals, the rounding error just happened to occur about as often as the “lost movement” from rounding down would have added up to 10cm, so it kind of balanced itself out. Changing it to 1cm intervals meant less data was lost to rounding down, but also meant when the rounding-up error did occur it was only triggering enough movement for 1cm, meaning it lost the ‘happy accident of balance’ that it previously had which lead to desync problems being slightly more pronounced over the last couple of months.
During my trek over the last few weeks into figuring out the desync issues Samool and I came across that piece code and recognised that it could cause some precision errors, so I changed it to something that wouldn’t. This lead me to a couple days of pulling my hair out over stamina no longer draining as expected before realising the rounding error was the reason it ever worked properly in the first place, and that the code down the chain relied on it to work 99% of the time for smaller movements. Without having this rounding error, slow movement will very rarely be above that 1cm threshold, meaning none of the drain code or update code was ever called, since the movement was rounded down to 0.
Fixing this new issue involved changing the entire chain of code to not use the byte value differences, and instead use the actual position data and calculating exact difference values when needed, eliminating any need for rounding down (or random errors causing rounding up). This also lead to a couple days of rebalancing stamina drain from movement to be as close to live as possible considering the old system relied on a random rounding error to work at all. I’m truly surprised that it worked as well as it did being based on an error for smaller movements like that.
The upside of all of this is that movement updates sent to players now use the exact position instead of difference values, meaning desync should be a thing of the past. The place that you see other creatures and players is the exact same position that the server sees that creature of player - and stamina draining should be a lot more precise for smaller movements instead of relying on a rounding error.
I hope I never have to touch movement code again.