Server Outage
Incident Report for City of Pleasanton Systems
Resolved
While there may still be remnants of anomalies or outages to an endpoint computer, we now consider this incident resolved. The IT department will release a full post-mortem report on this incident. This will have an executive summary but also include details regarding the timeline, our response, impact analysis, and lessons learned, among other details.
Posted Jul 23, 2024 - 11:43 PDT
Update
Monday Morning Blues - screen of death. See what I did there? While, for the most part, we have remediated the most critical systems that urgently need to be restored, we are dealing with the remaining group of knocked-out-of-commission computers.

Please be patient; the team is trying to get through these quickly. Meanwhile, it has been learned that some have been able to recover from the BSOD by rebooting their systems up to 15 times (that is not a typo).

Thanks again for your continued support.
Posted Jul 22, 2024 - 08:48 PDT
Update
Making great progress—There has been a great team of people helping with this effort, from our external partners here in town—ENT Networks—to deputized IT folks in Public Works and other areas of this great organization assisting IT.

- Servers: We believe we have restored service to the most critical and urgent servers needing recovery

- Workstations: We continue to work on restoring services to computers that have been stubborn about coming back online. The encouraging news is that while it has often been a case-by-case remedy, we have had 100% success. However, it will take continued patience if we have not visited your system yet.

If you have an urgent need that we should recognize in our prioritization - please don't be shy about taking another bite of the apple and letting us know via our service desk.

In-Progress: The team is hitting remote sites - such as LPFD stations, DBAC, Gingerbread, etc.
Posted Jul 19, 2024 - 13:28 PDT
Update
So... How's your day? Me? Oh, I'm just looking to see who will take my shares of Crowdstrike stock. Here is where we are as we assess this morning's damage, including collateral damage.

We still have about 16 servers that need their services restored. The "fix" does not work for some, and a full restore from our backup system is required.

We have multiple endpoint systems that require service. The Crowdstike "fix" success rate is low and, in some cases, not applicable.

We have external partners in the same position as us—some that we integrate with, some that we employ as Software-as-a-Service, and some that we rely on for the resources they provide.

So, thank you so much for your patience and understanding and for not being offended if it appears you are not the priority. You are! But it takes a bit to get from the pool to your desk.
Posted Jul 19, 2024 - 09:45 PDT
Update
Continue to Monitor
Posted Jul 19, 2024 - 09:38 PDT
Monitoring
Many Servers have had their services restored, and many more are pending the team's attention. We continue to work through these.

The more significant challenge is that many will be met with a computer screen with a BSOD (blue screen of death). Crowdstrike has released an update that should bring your PC or laptop back online upon a reboot.

However, this has been a bit of a hit-or-miss for us. Here are the Workaround steps for individual hosts:

1. Reboot the Computer:
- Restart your computer to allow it to download the necessary files.

2. If the Computer Crashes Again:
- Start in Safe Mode or Recovery Mode:
---- Restart your computer and press the appropriate key to enter Safe Mode or the Windows Recovery Environment (usually F8, F11, or a similar key).

- Navigate to the Specific Folder:
---- Open File Explorer.
---- Go to: C:\Windows\System32\drivers\CrowdStrike.

- Delete the Problematic File:
---- Look for a file that starts with “C-00000291” and ends with “.sys”.
---- Delete this file.

3. Restart Normally:
- Restart your computer as usual.

If you continue to have boot issues - please contact the Service Desk via any of these channels:

• Web: https://help.cityofpleasantonca.gov (access requires your city network login account)
• eMail: itrequest@cityofpleasantonca.gov
• Phone: 925.931.5083—if no one answers, leave a message. Doing so will auto-generate a ticket with your voicemail as an attachment.
• SMS Text: 925.931.8350
Posted Jul 19, 2024 - 02:40 PDT
Identified
We want to inform you about a global issue currently affecting Microsoft systems running CrowdStrike endpoint protection security—systems we also utilize. This issue is causing widespread disruptions, resulting in the offline status of many of our Windows desktops and servers.

Current Impact:
Authentication and Network Services: Some of our servers, which authenticate logins, provide IP addresses to endpoints, and use DNS to resolve hostnames and URLs, are affected. This disruption may present conditions that appear to be network-related.

Critical Applications: Servers hosting critical software applications and services are also offline, impacting their availability and performance.

Our Response:
Disaster Recovery Effort: We are treating this situation with the urgency of a Disaster Recovery effort to ensure business continuity. We focus on restoring our most critical systems, starting with servers and extending to desktops.

Monitoring and Mitigation: We are closely monitoring the situation and actively seeking remedies to address the root cause of the disruption.

Next Steps:
Workarounds and Potential Remedies: We are discovering workarounds and potential solutions, though neither Microsoft nor CrowdStrike has officially released any.

Ongoing Updates: We will continue to provide updates via this System Status Page. We encourage you to subscribe for the latest updates, which may not be forwarded to the "All Users" email distribution list.

Support:
If you have any immediate concerns or require assistance, please contact our support team via the Service Desk

We apologize for any inconvenience this may cause and appreciate your understanding as we work through this global incident.

Thank you for your patience and cooperation.
Posted Jul 19, 2024 - 01:05 PDT
Update
This incident involves network outages. Staff is heading onsite.
Posted Jul 18, 2024 - 23:25 PDT
Investigating
Our monitoring system is alerting to server outages - We are investigating now
Posted Jul 18, 2024 - 23:01 PDT
This incident affected: Local and Wide Area Network and IT Servers.