Summary
At 12:38 PM on 29th of July 2021, network engineers were alerted to interface & routing protocol up/down events between the Syncom SYD2 and Equinix SY1 facilities, during the investigation stages we identified that one of the stack members in our Syncom SYD2 core switching stack had dropped from the topology causing kernel issues to occur. The stack member re-joined the topology after an automatic reload and services begun restoring. Because of many kernel error messages from this event, engineers will schedule an emergency firmware upgrade to the latest available firmware for this environment.
Timeline of Events
12:38 PM - Engineers receive alerts of flapping interconnect between the Syncom SYD2 & Equinix SY1 facility
12:42 PM - Same interface flaps for a second time
12:44 PM - Same interface flaps for a third time and kernel error messages flood logs
12:45 PM - Engineers drain the flapping link to move production traffic to a redundant link
12:45 PM - Same interface flaps for a fourth time
12:50 PM - Connectivity lost to Core Switching Stack in Syncom SYD2
12:52 PM - On-site engineer investigates further
12:53 PM - Status notice raised on our status page, informing clients
12:55 PM - Services come back to a reachable state
12:58 PM - On-site engineer confirms that one stack member appeared to have dropped and was coming back online
01:05 PM - All services confirmed to have restored back to an operational state
Further Action
Engineers will schedule an emergency maintenance window between 1 AM & 5 AM on 30th of July 2021 to upgrade our Core Switching Stack in the Syncom SYD2 facility to the latest recommended firmware version.