Measuring Dwell Time & Security Operations

Dwell time is one of the most powerful metrics to measure an organization’s cybersecurity effectiveness against today’s threat landscape. Security teams use it to assess the entire operational process of the security program, from architecture to engineering, through operations and incident response. In turn, key decision makers and stakeholders can look at this metric to gauge how well their team or service provider prevents, detects and neutralizes threats.

This metric has the capability of being a real game changer for companies and their strategic decisions regarding cybersecurity initiatives, regardless of whether they have an established, in-house security team or leverage a managed service provider.

Metrics that matter

When operating in an environment where time, budget and expertise are extremely limited, decisions can’t be made based on gut feelings or hypothetical ideas. But unfortunately, a lot of businesses are making security decisions based on hunches, knee-jerk reactions (e.g. as an immediate response to a highly publicized cyber incident) or catchy marketing campaigns. It’s one of many reasons why companies continue to get breached even after accumulating piles of underutilized, million-dollar security assets. Luckily, this is about to change.

There are now a number of metrics to take the guesswork out of security operations and help decision makers determine degrees of vulnerability and risk. Of course, due to time, financial and human resource constraints, it’s impractical to measure everything, but below are a few key metrics to pay attention to:

Measures that provide a snapshot of vulnerability and risk

Points of Risk per Device (PORD)	A metric measuring the attack surface area across devices, hosts, servers, etc., as well as patching efficiency. Business risk should be applied to the systems being measured for an accurate risk picture.
Protection Ratio	A ratio obtained by dividing the number of incidents by the number of systems protected.

While metrics for vulnerability and risk help identify areas that may require additional security, there are also metrics for measuring the tactical effectiveness of security operations. These metrics can be particularly useful in gauging the success of your organization’s security initiatives.

Metrics that measure tactical effectiveness of operations

Rate of closed incidents where the root cause was not determined	This will indicate whether your team has the capability and/or expertise to determine the cause of an incident, which would in turn, indicate whether they could prevent the same incident from recurring.
Incident False Positive Rates	The ratio of the number of false positive incidents to actual incidents.
Changes in the number of infections from commodity threats	Though this metric can be difficult to obtain, it tells you how your security is performing against lower level (a.k.a. commodity) threats.
Number of compromises in public-facing web servers	These gauge an organizations’ success in protecting its most exposed applications. If the trend indicates an increase in the number of compromises, then it’s time to review your patching program.
Ratio of user-identified incidents to SOC-identified incidents.	Ideally, the SOC should identify incidents before users do. Use this to measure effectiveness of detection.
Change in recurring incidents	If a particular type of incident appears to be increasing in frequency, this might signal a security control issue.
Change in the number of incidents that spread across multiple hosts	An increase would indicate poor network segmentation as well as inability to prevent malware propagation or privilege escalation.

As previously mentioned, it wouldn’t be practical to monitor every single metric. In fact, there are some metrics that companies have been using but are really false indicators of performance. Don’t let them mislead you.

Metrics that can be false indicators of performance

Changes in the overall number of incidents.

In the absence of context, increases or decreases in this statistic could indicate that the program is failing or succeeding. Context is needed to explain why the statistic is up or down and how that reflects on a team’s efforts.

Changes in the overall number of detections.

A decrease in detections doesn’t automatically mean fewer incidents. It could mean the threat actor has changed strategies and is now successfully evading the sensors.

Investment in security technology

This statistic is more apt for an overall benchmark comparison of what organizations are spending on their security.

There is no direct correlation between the investment in technology to the performance of that technology and your team. In fact, security technology providers continue to fail overall at measuring and communicating the direct reduction in risk or other value for the technology you buy.

Like these metrics, dwell time can enable security professionals and executives to gauge the effectiveness of their security operations and in turn, inform their decision making. Aside from that, customers can also use it as a barometer for determining how well a security provider can deliver on their services. Unfortunately, very few security providers seem to be currently using dwell time to demonstrate their capabilities.

How security providers really treat dwell time

In order to take the pulse of dwell time’s adoption in the security services industry, we performed an anecdotal review of eight managed security service providers (MSSPs) and security-as-a-service (SaaS) providers combined. We wanted to know if:

These providers were starting to talk about dwell time, mean time-to-detection, mean time-to-response or other similar metrics on their website or marketing collateral.
These providers were providing any actual metrics demonstrating how they performed in terms of these metrics.

The results showed that most providers (7 out of 8) are already familiar with dwell time and do in fact acknowledge the importance of reducing it. However, only one vendor actually indicated designing their operations with dwell time in mind. Worse, not a single one published any specific dwell time statistics on either their website or marketing collateral.

Something to note though: there is a stigma around dwell time in the security industry as it highlights the ugly truth. Regardless of your best efforts or thoughtful design you can still be the subject of compromise. Instead of shying away from truth, show the world that you understand the reality of the threat landscape and your organization is capable of dealing with attackers within your environment.

It’s unfortunate as dwell time has the ability to demonstrate the real value of a provider’s protection far better than other metrics like the number of ‘events processed’ or ‘alerts generated for the customer.’ If I were a customer looking to outsource my security, third-party providers that can demonstrate shorter dwell times would be top candidates for the job.

Restructuring your security program to incorporate dwell time

So how does one incorporate dwell time into their security program? Since this is something that has yet to gain widespread adoption in the cybersecurity services industry, it would be wise to draw ideas from an environment where the integration of dwell time has already been tested and proven. In that regard, we can get some ideas from strategies and design elements of Armor’s Spartan threat prevention and response platform. Here are some of them:

Embrace the comprehensive definition of dwell time

Some people think dwell time is synonymous to mean time to detect. As detailed in the eBook, “5 Days to Actions and Objective – Dwell Time as a Critical Security Success Metric,” that interpretation is too narrow and omits several critical phases of the Cyber Kill Chain.

Fully integrate response capabilities with security operations

Most organizations have very limited incident response (IR) capabilities and often outsource. As we’ve previously pointed out in the blog post, “Re-Evaluating Dwell Time and Incident Response,” when organizations treat IR separately, they will inevitably introduce an additional gap or delay that will consequently extend the threat actor’s window of opportunity. An IR that’s fully integrated into your security operations will enable you to close that gap.

Integrate continuous threat hunting

Traditional detection tools tend to miss unknown threats that have adopted advanced evasion techniques. Organizations have a better chance of finding those threats (and, in turn, reduce dwell time) if you integrate continuous threat hunting into your operations.

Actively measure dwell time

It’s not enough to just measure dwell time. In order to track your progress and see if your security efforts have in fact resulted in shorter dwell times, they need to be measured actively. If it has remained stagnant or, worse, is increasing, then your strategies need to be re-evaluated.

Adopt a highly proactive approach to security

Don’t be content with just responding to incidents or worse, waiting for your MSSP to carry out an IR for you. Before you know it, threat actors would have already completed the Kill Chain. Adopt a proactive approach in your security program to anticipate potential attacks and set up countermeasures to repel incoming threats or eliminate/detect them early in the Kill Chain.

Apply extensive automation and orchestration

Many security operations processes can be automated and orchestrated. Automation and orchestration, which eliminates or cuts down manual processes, can speed up incident response and remediation.

Here at Armor, we don’t just treat dwell time as a metric. Rather, we’ve embraced it as a proactive security philosophy and culture that drives unified change across all security operations to achieve a common objective – minimize the opportunity a threat actor has to cause harm to your organization. We encourage you to do the same.

July 3, 2018