RSA Expo 2024 will be held at Moscone Center, San Francisco, from May 6-9, featuring Booth ESE-16.
Comments Off on Scaling Security Operations using Data Orchestration
Posted By

Databahn Team


Scaling Security Operations using Data Orchestration

Lately, there has been a surge in discussions through numerous articles and blogs emphasizing the importance of disentangling the processes of data collection and ingestion from the conventional SIEM (Security Information and Event Management) systems. Leading detection engineering teams within the industry are already adapting to this transformation. They are moving away from the conventional approach of considering security data ingestion, analytics (detection), and storage as a single, monolithic task.

Instead, they have opted to separate the facets of data collection and ingestion from the SIEM, granting them the freedom to expand their detection and threat hunting capabilities within the platforms of their choice. This approach not only enhances flexibility to bring the best of breed technologies, but also proves to be costeffective, as it empowers them to bring in the most pertinent and relevant data for their security operations.

Staying ahead of threats requires innovative solutions. One such advancement is the emergence of nextgeneration data-focused orchestration platforms.

So, what is Security Data Orchestration?

Security data orchestration is a process or technology that involves the collection, normalization, and organization of data related to cybersecurity and information security. It aims to streamline the handling of security data from various sources, making it more accessible in destinations where the data is actionable for security professionals.

Why is Security Data Orchestration becoming a big deal now?

Not too long ago, security teams adhered to a philosophy of sending every bit of data everywhere. During that era, the allure of extensive on-premise infrastructure was irresistible, and organizations justified the sustained costs over time. However, in the subsequent years, a paradigm shift occurred as the entire industry began to shift its gaze towards the cloud.

This transformative shift meant that all the entities downstream from data sources—such as SIEM (Security Information and Event Management) systems, UEBA (User and Entity Behavior Analytics), and Data Warehouses—all made their migration to the cloud. This marked the inception of a new era defined by subscription and licensing models that held data as a paramount factor in their quest to maximize profit margins.

In the contemporary landscape, most downstream products, without exception, revolve around the notion of data as a pivotal element. It’s all about the data you ingest, the data you process, the data you store, and, not to be overlooked, the data you search in your quest for security and insights.

This paradigm shift has left many security teams grappling to extract the full value they deserve from these downstream systems. They frequently find themselves constrained by the limitations of their SIEMs, struggling to accommodate additional valuable data. Moreover, they often face challenges related to storage capacity and data retention, hindering their ability to run complex hunting scenarios or retrospectively delve deeper into their data for enhanced visibility and insights.

It’s quite amusing, but also concerning, to note the significant volume of redundant data that accumulates when companies simply opt for vendor default audit configurations. Take a moment to examine your data for outbound traffic to Office 365 applications, corporate intranets, or routine process executions like Teams.exe or Zoom.exe.

Sample data redundancy illustration with logs collected by these product types in your SIEM Upon inspection, you’ll likely discover that within your SIEM, at least three distinct sources are capturing identical information within their respective logs. This level of data redundancy often flies under the radar, and it’s a noteworthy issue that warrants attention. And quite simply, this hinders the value that your teams expect to see from the investments made in your SIEM and data warehouse.

Conversely, many security teams amass extensive datasets, but only a fraction of this data finds utility in the realms of threat detection, hunting, and investigations. Here’s a snapshot of Active Directory (AD) events, categorized by their event IDs and the daily volume within SIEMs across four distinct organizations.

It is evident that, despite AD audit logs being a staple in SIEM implementations, no two organizations exhibit identical log profiles or event volume trends.

Adhering solely to vendor default audit configurations often leads to several noteworthy issues:
  1. Overwhelming Log Collection: In certain cases, such as Org 3, organizations end up amassing an astronomical number of logs from event IDs like EID 4658 or 4690, despite their detection teams rarely leveraging these logs for meaningful analysis.
  2. Redundant Event Collection: Org 4, for example, inadvertently collects redundant events, such as EID 5156, which are also gathered by their firewalls and endpoint systems. This redundancy complicates data management and adds little value.
  3. Blind spots: Standard vendor configurations may result in the omission of critical events, thereby creating security blind spots. These unmonitored areas leave organizations vulnerable to potential threats

On the other hand, it’s vital to recognize that in today’s multifaceted landscape, no single platform can serve as the definitive, all-encompassing detection system. Although there are numerous purpose-built detection systems painstakingly crafted for specific log types, customers often find themselves grappling with the harsh reality that they can’t readily incorporate a multitude of best-of-breed platforms.

The formidable challenges emerge from the intricate intricacies of data acquisition, system management, and the prevalent issue of the ingestion layer being tightly coupled with their SIEMs. Frequently, data cascades into various systems from the SIEM, further compounding the complexity of the situation. The overwhelming burden, both in terms of cost and operational intricacies, can make the pursuit of best-of-breed solutions an impractical endeavor for many organizations.

Today’s SOC teams do not have the strength or capacity to look at each source that is logging to weed out these redundancies or address blind spots or take only the right and relevant data to expensive downstream systems like the SIEM or analytics platforms or even manage multiple data pipelines for multiple platforms.

This underscores the growing necessity for Security Data Orchestration, with an even more vital emphasis on Context-Aware Security Data Orchestration. The rationale is clear: we want the Security Engineering team to focus on security, not get bogged down in data operations.

So, how do you go about Security Data Orchestration?

In its simplest form, envision this layer as a sandwich, positioned neatly between your data sources and their respective destinations.

The foundational principles of a Security Data Orchestration platform are –

Centralize your log collection:-  Gather all your security-related logs and data from various sources through a centralized collection layer. This consolidation simplifies data management and analysis, making it easier for downstream platforms to consume the data effectively.

Decouple data ingestion:- Separate the processes of data collection and data ingestion from the downstream systems like SIEMs. This decoupling provides flexibility and scalability, allowing you to fine-tune data ingestion without disrupting your entire security infrastructure.

Filter to send only what is relevant to your downstream system:- Implement intelligent data orchestration to filter and direct only the most pertinent and actionable data to your downstream systems. This not only streamlines cost management but also optimizes the performance of your downstream systems with remarkable efficiency.


At, our mission is clear: to forge the path towards the next generation Data Orchestration platform. We’re dedicated to empowering our customers to seize control of their data, but without the burden of relying on communities or embarking on the arduous journey of constructing complex Kafka clusters and writing intricate code to track data changes.

We are purpose built for Security, our platform captures telemetry once, improves its quality and usability, and then distribute it to multiple destinations – streamlining cybersecurity operations and data analytics.

DataBahn seamlessly ingests data from multiple feeds, aggregates, compresses, reduces, and intelligently routes it. With advanced capabilities, it standardizes, enriches, correlates, and normalizes the data before transferring a comprehensive time-series dataset to your data lake, SIEM, UEBA, AI/ML, or any downstream platform.

DataBahn offers continuous ML and AI powered insights and recommendations on the data collected to unlock maximum visibility and ROI. Our platform natively comes with

  • Out-of-the-box connectors and integrations:- DataBahn offers effortless integration and plug-and-play connectivity with a wide array of products and devices, allowing SOCs to swiftly adapt to new data sources.
  • Threat Research Enabled Filtering Rules:- Pre-configured filtering rules, underpinned by comprehensive threat research, guarantee a minimum volume reduction of 35%, enhancing data relevance for analysis.
  • Enrichment support against Multiple Contexts:- DataBahn enriches data against various contexts including Threat Intelligence, User, Asset, and Geo-location, providing a contextualized view of the data for precise threat identification.
  • Format Conversion and Schema Monitoring:- The platform supports seamless conversion into popular data formats like CIM, OCSF, CEF, and others, facilitating faster downstream onboarding. It intelligently monitors log schema changes for proactive adaptability.
  • Schema Drift Detection:- Detect changes to log schema intelligently for proactive adaptability.
  • Sensitive data detection:- Identify, isolate and mask sensitive data ensuring data security and compliance.
  • Continuous Support for New Event Types:- DataBahn provides continuous support for new and unparsed event types, ensuring consistent data processing and adaptability to evolving data sources.

Data orchestration revolutionizes the traditional cybersecurity data architecture by efficiently collecting, normalizing, and enriching data from diverse sources, ensuring that only relevant and purposeful data reaches detection and hunting platforms. Data Orchestration is the next big evolution in Cyber Security, that gives Security teams both the control and flexibility at the same time, being agile and cost-effective.