A Breach in Azure, a True Story (and what to do)

It was an unusual scenario for us; the client brought us in to do the migration to Azure, but kept dragging their feet. We staged out all the systems, including IaaS, AVD, Deployed Defender, and were on ice waiting for cutover date. Meanwhile, we knew the legacy on-prem environment was problematic: Citrix for remote access, Duo 2FA, too many accounts with rights; the usual suspects. Since we weren’t their MSP and there was a CTO, we pointed these things out politely but when nothing was done, simply moved on.

A few days ago, we got the call we were dreading. A sophisticated malicious actor has breached. Naturally, we dropped what we were doing and jumped in. Their on-premise environment was beyond salvage. The hackers knew how to attack, encrypting backups, the SAN, even figuring out that there’s an Azure environment across the VPN and jumping across with a second set of compromised credentials.

In Azure, however, their plan hit a snag. Defender was blocking the compromised account, in spite of its rights, from accessing VMs and AVDs. They were able to disable soft-delete of backups, but within a few hours of working with Microsoft, the Product Team was able to recover the data from just before. Three days later, the client is back to fully operational status, despite having every single system on the production side destroyed. Here is what you should learn:

Azure vs. On-Prem: This warrants its own article, but I’ll only say that these two are not the same. From built in alerting, to ability to recover data, to Defender’s truly evolutionary capabilities, if you’ve been putting off your migration, you’re gambling.
Have a Strategic Plan: Having an Incident Response Plan that is both detailed and tested is key. Who will notify the business? Is there a forensics firm on retainer? Do you have cyber insurance?

What if you’re reading this because you’re under attack and need help? No problem, here is what we typically recommend:

Down all points of ingress / egress: Disabling VPN, Citrix, AVD is nice, but not enough. There may be ‘legitimate’ call home software installed, such as CloudFlared or AnyDesk, which wont get picked up by IPS / IDS. Unless you like sleeping in the office, and as inconvenient as it sounds, unplug your internet.

Note: If you decide to disable routing, make sure your firewalls didn’t have external management enabled, there isn’t a new VPN tunnel, and all admin accounts are deleted, with only one new one created.

If your Active Directory has been breached, and it’s safe to assume that it has, start by recovering one Domain Controller. From there perform an administrative sysvol replication, seize FSMO roles, and any other functions to make sure it’s healthy. Then promote a few more.
Make a list of all admin accounts. Create one new account and add it to Enterprise Admins. Now remove everyone out of Domain Admins, Enterprise Admins, Administrators, and Schema Admins aside from the new account. Disable those accounts too.

Disable as many accounts as possible. Accounts associated with Shared MBs don’t need to be enabled. Accounts that haven’t logged in for two weeks should also be disabled.
Compare licensed users to a comprehensive list of users from HR. Make sure there are no ‘extra’ users with licenses and access.
Pull a report of all accounts with local admin rights, both server and workstation. This is where many cut corners and fail. Still running backup agents with credentials? Give the account backup rights but not admin rights. Check if it needs interactive logon. If not, don’t give it. You get the point.
Start giving privileged accounts to response team but the default answer should be ‘no’. As in ‘no, IR doesn’t need Domain Admin and GA. We will install the collectors and confirm they’re getting the forensics data’. The CIO doesn’t need rights either, no matter how hands-on he used to be ten years ago.
If you haven’t already, rename and disable the default Administrator accounts both in the domain and on all local IaaS. This can be done via GPO.

Make sure all devices are enrolled in MDM. If you have Intune, make sure all devices must run Company Portal before being admitted. Make sure your Conditional Access Policy has no exclusions. In fact, for the next few weeks, remove Known Locations and force everyone without an active session to 2FA.

Now drop all active sessions so all devise have to re-authenticate.

If you can, deploy Defender for Endpoint and Identity. It’s a vastly underrated family of tools that can prevent some of the most sophisticated attacks such as golden ticket. >
If you’re not running Azure Virtual Desktop, rebuild all endpoints from image.

If you are, just scan the endpoints for keyloggers and other malware.

Reset Kerberos in Active Directory (twice) and reboot all systems.
By now your Incident Response team should have installed and collected logs. Assuming you’ve recovered and / or rebuilt systems that are breached, they should be able to give a clan bill of health within 24 hours or so.
Start allowing users back into the pool, but make sure someone has enabled Network Traffic Analysis on all points of egress and is actively watching it.

Naturally, there’s no single process that works for everyone and these are just the highlights of our most recent experience.

Of course, you can always call us for help.

A Breach in Azure, a True Story (and what to do)