Originally Posted by Delusion83
How about an explanation on what actually really happened and what is done to prevent it from happening again, also in relation to the offline mode!?
Offline mode is orthogonal to this issue. We haven't changed much about Offline mode lately, and this issue hasn't changed anything about Offline mode -- though it did stress its functionality because users wanted it to work to get around problems with their connection. Thing is, offline mode might not have worked to help this issue because of the nature of the problem.
Offline mode is meant to work when the client is offline. In this case, the client wasn't exactly offline because it actually had connected to Steam. When a connection happens, the Steam servers give the client a set of configuration data. That data describes to the client what titles are available, what titles are no longer available, and so on. Some of that information is region-specific; games may or may not be available in certain regions.
That information is also cached on the client.
The misconfiguration was that a game was marked as available in the default Steam subscription -- the one with the Steam client itself and all of the free titles -- but was also marked as not being available in DE. Then the client saw this, it decided the configuration information was invalid and logged off the servers. That validation and logoff procedure took a variable amount of time, giving the illusion that the problem was really (or at least possibly) a network-related issue. We might have been predisposed to thinking there was a network-related problem because there has been a history of routing problems between some areas in Europe and Valve; in fact, we were actually tracking such an issue with an ISP at the time this issue surfaced. After investigation, we found that the bogus configuration tripped a safety mechanism designed to defend the client against that incorrect information.
The configuration data that Steam ships around is pretty large and surprisingly complicated. We've got checks to make sure we don't publish bad configuration info for most common cases, but some obscure cases -- like this one -- can get past our checks and end up in the wild.
Since many users had the cached information, they weren't affected immediately. When they received the new copy of the information with the incorrect configuration, they went through the automatic logout path and noticed problems. This symptom was a bit masking in that only certain users were affected in certain regions; until we could reproduce the problem ourselves and observe the effect of the misconfiguration, we were unable to find the cause of the problem and take action.
It's impossible to fix an issue which isn't understood; we have to take the time to diagnose it. We try to communicate as well as we can during that phase, and it could improve -- but the limitation is that we don't know many details, either. If we make a post based on information that isn't completely solid, it looks terrible. We might think we'll have it fixed in five minutes. Saying so, only to find the issue is much more complicated and will take substantially longer to fix, ends up being much worse for everyone in the long run. As a result, we try to be quick to acknowledge problems, and we did that here. We can't provide details until we, ourselves, have them.
I hope that provides the insight into the issue that you've asked for.