Service Disruption - SYD2 Services
Incident Report for Servers Australia
Postmortem

Summary
At 12:38 PM on 29th of July 2021, network engineers were alerted to interface & routing protocol up/down events between the Syncom SYD2 and Equinix SY1 facilities, during the investigation stages we identified that one of the stack members in our Syncom SYD2 core switching stack had dropped from the topology causing kernel issues to occur. The stack member re-joined the topology after an automatic reload and services begun restoring. Because of many kernel error messages from this event, engineers will schedule an emergency firmware upgrade to the latest available firmware for this environment.

Timeline of Events
12:38 PM - Engineers receive alerts of flapping interconnect between the Syncom SYD2 & Equinix SY1 facility
12:42 PM - Same interface flaps for a second time
12:44 PM - Same interface flaps for a third time and kernel error messages flood logs
12:45 PM - Engineers drain the flapping link to move production traffic to a redundant link
12:45 PM - Same interface flaps for a fourth time
12:50 PM - Connectivity lost to Core Switching Stack in Syncom SYD2
12:52 PM - On-site engineer investigates further
12:53 PM - Status notice raised on our status page, informing clients
12:55 PM - Services come back to a reachable state
12:58 PM - On-site engineer confirms that one stack member appeared to have dropped and was coming back online
01:05 PM - All services confirmed to have restored back to an operational state

Further Action
Engineers will schedule an emergency maintenance window between 1 AM & 5 AM on 30th of July 2021 to upgrade our Core Switching Stack in the Syncom SYD2 facility to the latest recommended firmware version.

Posted Jul 29, 2021 - 15:45 AEST

Resolved
This incident has been resolved.

A post-mortem will be posted shortly as well as an emergency maintenance for tomorrow morning.
Posted Jul 29, 2021 - 15:44 AEST
Monitoring
Network engineers have reviewed further data in relation to the identified networking device and are now organising a brief emergency maintenance window overnight to apply a firmware update.

The time window and further information in relation to this maintenance will be provided in a separate maintenance notification and on our status page.
Posted Jul 29, 2021 - 13:41 AEST
Identified
Network engineers believe they have identified a networking device within the SYD2 network that has suffered an unexpected power event.

Impacted services are expected to be restored in the next 5-10 minutes as the device returns to normal operations. A full investigation into the cause of this device power event is underway. Further updates will be provided to this update as Engineers review the device further.
Posted Jul 29, 2021 - 13:02 AEST
Investigating
Engineers are currently aware and are investigating a disruption impacting services within our SYD2 facility.

Further investigations are underway to identify the cause and resolve any disruptions being experienced. An update will be provided as soon as possible.
Posted Jul 29, 2021 - 12:53 AEST
This incident affected: Data Centres (Syncom SYD2).