article

5 talking points around the British Airways IT failure

An IT system failure left 300,000 passengers stranded around the world in what will be remembered as a catastrophic event for British Airways. But what really happened?

british-airways-ba

Source: Reuters

Why might we have reason to doubt the “power surge” claims?

According to Álex Cruz, BA’s chairman and chief executive, the catastrophic IT failure that left hundreds of thousands of passengers stranded over the UK Bank Holiday Weekend was due to a “power surge.”

It is alleged that the surge was “so strong that it rendered the back-up system ineffective”.

Several experts have subsequently publicly expressed their doubt as to the validity of this claim. “Multiple data centre designers have told the Guardian that a power surge should not be able to bring down a data centre, let alone a data centre and its back-up,” the British newspaper reported on Tuesday 30 May.

“It’s either bad design or there’s more to the story than just a power surge,” said James Wilman, chief executive of the data centre consultancy Future-tech. “You have something specifically that you build in to a data centre called surge protection, which is there to protect against exactly this incident. You also have an uninterruptible power supply, a UPS, and part of its job is to condition the power” – ie smooth out the peaks and flows in current.

“Between those and a quality earthing system, you should be protected from power surges,” Wilman said.

According to another leading British newspaper, The Times, SSE and UK Power Networks, both electricity companies that provide energy to the geographical location at which the airline holds its data centre have denied the possibility of a power surge.

Why was the impact so large?

An unnamed corporate IT expert, speaking to the BBC, further suggested that a power failure shouldn’t have even caused a “flicker of the lights” in the data-centre due to the presence of the UPS – the uninterruptible power supply.

Essentially, the scale of the impact and the amount of people that were affected is largely due to the time taken to reboot the system. 

Why did BA’s reboot take so long?

Once the power was lost, the airline’s crisis management plan should have kicked in but as many media outlets have suggested, the overly-complex IT system is largely outsourced to India and many of the experts who initially helped to cultivate and develop the network left when the jobs were moved. 

If you saw the amount of old infrastructure that this country is hanging off of, you wouldn’t sleep at night.

Many believe that the time taken to reboot remains a combination of poor crisis management planning, an under-trained and under-staffed IT support team and an incomplete understanding there-and-then of the complex logistics of air travel in the 21st Century.

Is the airline industry’s infrastructure so outdated?

Returning to James Wilman once again, he suggests that airlines’ IT systems are fundamentally outdated and notably the British communications infrastructure is too old.

“We were leading the communications curve back 20 years ago, and the problem is that that now means that much of our infrastructure is hanging off a 25-year-old backbone. Some data centres are reaching the end of their life. And how do you refurbish that when you can’t turn it off?

“If you saw the amount of old infrastructure that this country is hanging off of, you wouldn’t sleep at night,” he said.

The cyber threat and conclusions to come?

It is unfair on BA to over-speculate as the investigation is only just underway and information is only beginning to seep through.

Many fear that cyber threats may have been behind the failure while others argue that it was just one big mistake. Irrespective, it has happened and serious questions now need to be asked as the implications of the failure could be monumentally significant.

International Airport Review will be covering the story as more information comes through. 

Send this to a friend