Cybersecurity Data Fabric: What on earth is security data fabric?

Understand what a Security Data Fabric is, and why an enterprise security team needs one to achieve better security while reducing SIEM and storage costs

March 12, 2024

What on earth is security data fabric, and why do we suddenly need one?

Every time I am at a security conference, a new buzzword is all over most vendors’ signage, one year it was UEBA (User Entity and Behavioral Analytics), next EDR (Endpoint Detection and Response), then XDR (Extended Detection and Response), then it was (ASM) Attack Surface Management. Some of these are truly new and valuable capabilities, some of these are rebranding of an existing capability. Some vendors have something to do with the new capability (i.e., buzzword), and some are just hoping to ride the wave of the hype. This year, we will probably hear a lot on GenAI and cybersecurity, and on the security data fabric. Let me tackle the latter in this article, with another article to follow soon on GenAI and Cybersecurity.

Problem Statement:

Many organizations are dealing with an explosion of security logs directed to the SIEM and other security monitoring systems, Terabytes of data every day!

  • How to better manage the growing cost of the security log data collection?
  • Do you know if all of this data clogging your SIEM storage has high security value?
  • Are you collecting the most relevant security data?

To illustrate, here is an example of windows security events and a view on what elements have high security value compared to the total volume typically collected:

  • Do you have genuine visibility into potential security log data duplication and underlying inconsistencies? Is your system able to identify missing security logs and security log schema draft fast enough for your SOC to avoid missing something relevant?
  • As SIEM and security analytics capabilities evolve, how do to best decouple security log integration from SIEM and other threat detection platforms to allow not only easier migration to lasted technology but provide cost-effective and seamless access of this security data for threat hunting and other user groups?
  • Major Next Gen SIEMs operate on a consumption-based model expecting end users to break down queries by data source and/or narrowed time range; which increases the total # of queries executed and increases your cost significantly!! Major Next-Gen SIEMs operate on a consumption-based model expecting end users to break down queries by data source and/or narrowed time range; which increases the total # of queries executed and increases your cost significantly!!

As security practitioners, we either accepted these issues as the cost of running our SOC, handled some of these issues manually, or hoped that either the cloud and/or SIEM vendors would one day have a better approach to deal with these issues, to no avail. This is why you need a security data fabric.

What is a Security Data Fabric (SDF)?

A data fabric is a solution that connects, integrates, and governs data across different systems and applications. It uses artificial intelligence and metadata automation to create flexible and reusable data pipelines and services. For clarity, a data fabric is simply a set of capabilities that allows you a lot more control of your data end to end, on how this data is ingested and where to forward it and stores it, in service of your business end goals, compared to just collecting and hoarding a heap of data in an expensive data lake, and hoping one day some use will come of it. The security data fabric is meant to tightly couple these principles with deep security expertise and the use of artificial intelligence to allow mastery of your security data and optimize your security monitoring investments and enable enhanced threat detection.

They key outcome of a security data fabric is to allow security teams to focus on their core function (i.e., threat detection) instead of spending countless hours tinkering with data engineering tasks, which means automation, seamless integration and minimal overhead on ongoing operations.

Components of a Security Data Fabric (SDF):

Smart Collection:

This is meant to decouple the collection of the security data logs from the SIEM/UEBA vendor you are using. This allows the ability to send the relevant security data to the SIEM/UEBA, sending a copy to a security data lake to create additional AI-enabled threat detection use cases (i.e., AI workbench) or to perform threat hunting, and send compliance-related logs to cold storage.

    Why important?         
  1. Minimize vendor lock-in and allow your system to leverage this data in various environments and formats, without needing to pay multiple times to use your own security data outside of the SIEM - particularly for requirements such as threat hunting and the creation of advanced threat-detection use cases using AI.
  1. Eliminate data loss with traditional SIEM log forwarders, syslog relay servers.
  1. Eliminate custom code/scripts for data collection.
  1. Reduced data transfer between cloud environments, especially in the case of having a hybrid cloud environment.

Security Data Orchestration:

This is where the security expertise in the security data fabric becomes VERY important. The security data orchestration includes the following elements:

  • Normalize, Parse, and Transform: Apply AI and security expertise for seamless normalization, parsing, and transforming of security data into the format you need for ingestion into your SIEM/UEBA tool, such as OCSF, CEF, CIM, or to a security data lake, or other data storage solutions.
  • Data Forking: Again, applying AI and security expertise to identify which security logs have the right fields and attributes that have threat detection value and should be sent to the SIEM, and which other logs should be sent straight to cold storage for compliance purposes, as an example.
  • Data Lineage and Data Observability: These are well-established capabilities in data management tools. We are applying it here to security data, so we no longer need to wonder if the threat detection rule is not firing because the log source is dead/MIA or because there are no hits. Existing collectors do not always give you visibility for individual log sources (at the level of the Individual device and log attribute/telemetry). This capability solves this challenge.
  • Data Quality: Ability to monitor and alert on schema drift and track the consistency, completeness, reliability, and relevance of the security data collected, stored, and used
  • Data Enrichment: This is where you start getting exciting value. The security data fabric uses its visibility to all your security data with insights using advanced AI such as:

    • Correlate with threat intel showing new CVEs or IoCs impacting your assets, here is how it looks in the MITRE Att&ck kill chain and provides a historical view of the potential presence of these indicators in your environment.
    • Recommendations on new threat detection use cases to apply based on your threat profile.
   Why important?
  1. Automation: At face value, existing tools promise some of these capabilities, but they usually need a massive amount of manual effort and deep security expertise to implement. This allows the SOC team to focus on their core function (i.e., threat detection) instead of spending countless hours tinkering with data engineering tasks.
  2. Volume Reduction: This is the most obvious value of using a security data fabric. You can reduce 30-50% of the data volume being sent to your SIEM by using a security-intelligent data fabric, as it will only forward data that has security value to your SIEM and send the rest to cheaper data storage. Yes, you read this correctly, 30-50% volume reduction! Imagine the cost savings and how much new useful security data you can start sending to your SIEM for enhanced threat detection.
  3. Enhanced Threat Detection: An SDF will enable the threat-hunting team to run queries more effectively and cheaply by giving them the ability to access a separate data lake, you get full control of your security data, and ongoing enrichments in how to improve your threat detection capabilities. Isn’t this what a security solution is about at the end of the day?
Uncover hidden visitor insights to improve their website journey
Share

See related articles

Why is DataBahn building agents? Why now?  

Agents are not new. But the problem they were created to solve has evolved. What’s changed is not just the technology landscape, but the role of telemetry in powering modern detection, response, AI analytics, and compliance. 

Most endpoint agents were designed for a narrow task: collect logs, ship them somewhere, and stay out of the way. But today’s security pipelines demand more. They need selective, low-latency, structured data that feeds not just a SIEM, but an entire ecosystem, from detection engines and data lakes to streaming analytics and AI models. 

Our mission has always been to eliminate data waste and simplify how enterprises move, manage, and monitor security data. That’s why we built the Smart Agent: a lightweight, programmable collection layer that brings policy, precision, and platform awareness to endpoint telemetry – without the sprawl, bloat, and hidden costs of traditional agents. 

A Revolutionary Approach to Endpoint Telemetry

Traditional agents are often built as isolated tools – one for log forwarding, another for EDR, a third for metrics. This results in resource contention, redundant data, and operational sprawl. 

DataBahn's Smart Agent takes a fundamentally different approach. It’s built as a platform-native component, not a point solution. That means collect once from the endpoint, normalize once, and route anywhere, breaking the cycle of duplication.  

Here’s what sets it apart: 

- Modular, Policy-Driven Control: Enterprise teams can now define exactly what to collect, how to filter or enrich it, and where to send it – with full version control, change monitoring, and audit trails.  

- Performance Without Sprawl: Replace 3–5 overlapping agents per endpoint with a single lightweight Smart Edge agent that serves security, observability, and compliance workflows simultaneously.  

- Built for High-Value Telemetry: Our agents are optimized to selectively capture only high-signal events, reducing compute strain and downstream ingestion costs.  

- AI-Ready, Future-Proof Architecture: These agents are telemetry-aware and natively integrated into our AI pipeline. Whether it’s streaming inference, schema awareness, or tagging sensitive data for compliance – they’re ready for the next generation of intelligent data pipelines.  

This isn’t just about replacing old agents. It’s about rethinking the endpoint as the first intelligent node in your data pipeline. 

Solving Real Enterprise Problems 

We’ve spent years embedded in complex environments – from highly regulated banks to fast-moving cloud-native tech firms. And across the board, one pattern kept surfacing: traditional approaches to endpoint telemetry don’t scale. 

  • Agent Sprawl is Draining Resources: Too many agents, too much overhead. Each one comes with its own update cycles, configuration headaches, and attack surface. Our agents consolidate that complexity – offering centralized control, real-time health monitoring, and zero-downtime updates. 
  • Agentless Left Security Teams in the Dark: APIs and control planes can’t capture runtime behavior, memory state, or user actions in real time. Our agents plug that gap – giving enterprises low-latency, high-fidelity data from endpoints, VMs, containers, and edge devices. 
  • Latency, Duplication, and Blind Spots: Polling intervals and subscription models delay detection. Meanwhile, multiple agents flood SIEMs with duplicate telemetry. DataBahn's agents are event-driven, deduplicated, and volume-aware – reducing noise and improving signal quality. 
  • A Platform Approach to Edge Data: DataBahn’s agents are not just better versions of old tools – they represent a strategic shift: a unified data layer from endpoint to cloud, where telemetry is no longer hardcoded to tools, vendors, or formats. 

What that enables: 

  • Multiple Deployment Models: Direct-to-destination, hybrid agentless, or agent-per-asset based on asset value. 
  • Seamless integration with our Smart Edge: Making it easy to extend telemetry pipelines, apply real-time transformations, and deliver enriched data to multiple destinations – without code. 
  • Compliance-Ready Logging: Built-in support for log integrity, masking, and tagging to meet industry standards like PCI, HIPAA, and GDPR. 

The End of the Agent vs. Agentless Debate  

The conversation around data collection has been stuck in a binary: agent or agentless. But in real-world environments, that framing doesn’t hold.  

What enterprises need isn’t one or the other but the ability to deploy the right mechanism based on asset type, risk, latency sensitivity, and the downstream use case.  

The future isn’t agent or agentless – it’s context-aware, modular, and unified. Data collection that adapts to where it’s running, integrates cleanly into existing pipelines, and remains extensible for what comes next, whether that’s AI-driven security operations, privacy-focused compliance, or cross-cloud observability.  

That’s the shift we’re enabling with the DataBahn Smart Agent. Not just a product – but a programmable foundation for secure, scalable, and future-ready telemetry.

In their article about how banks can extract value from a new generation of AI technology, notable strategy and management consulting firm McKinsey identified AI-enabled data pipelines as an essential part of the ‘Core Technology and Data Layer’. They found this infrastructure to be necessary for AI transformation, as an important intermediary step in the evolution banks and financial institutions will have to make for them to see tangible results from their investments in AI.

The technology stack for the AI-powered banking of the future relies greatly on an increased focus on managing enterprise data better. McKinsey’s Financial Services Practice forecasts that with these tools, banks will have the capacity to harness AI and “… become more intelligent, efficient, and better able to achieve stronger financial performance.

What McKinsey says

The promise of AI in banking

The authors point to increased adoption of AI across industries and organizations, but the depth of the adoption remains low and experimental. They express their vision of an AI-first bank, which -

  1. Reimagines the customer experience through personalization and streamlined, frictionless use across devices, for bank-owned platforms and partner ecosystems
  2. Leverages AI for decision-making, by building the architecture to generate real-time insights and translating them into output which addresses precise customer needs. (They could be talking about Reef)
  3. Modernizes core technology with automation and streamlined architecture to enable continuous, secure data exchange (and now, Cruz)

They recommend that banks and financial service enterprises set a bold vision for AI-powered transformation, and root the transformation in business value.

AI stack powered by multiagent systems

The true potential of AI will require banks of the future to tread beyond just AI models, the authors claim. With embedding AI into four capability layers as the goal, they identify ‘data and core tech’ as one of those four critical components. They have augmented an earlier AI capability stack, specifically adding data preprocessing, vector databases, and data post-processing to create an ‘enterprise data’ part of the ‘core technology and data layer’. They indicate that this layer would build a data-driven foundation for multiple AI agents to deliver customer engagement and enable AI-powered decision-making across various facets of a bank’s functioning.

Our perspective

Data quality is the single greatest predictor of LLM effectiveness today, and our current generation of AI tools are fundamentally wired to convert large volumes of data into patterns, insights, and intelligence. We believe the true value of enterprise AI lies in depth, where Agentic AI modules can speak and interact with each other while automating repetitive tasks and completing specific and niche workstreams and workflows. This is only possible when the AI modules have access to purposeful, meaningful, and contextual data to rely on.

We are already working with multiple banks and financial services institutions to enable data processing (pre and post), and our Cruz and Reef products are deployed in many financial institutions to become the backbone of their transformation into AI-first organizations.

Are you curious to see how you can come closer to building the data infrastructure of the future? Set up a call with our experts to see what’s possible when data is managed with intelligence.

Two years ago, our DataBahn journey began with a simple yet urgent realization: security data management is fundamentally flawed. Enterprises are overwhelmed by security and telemetry, struggling to collect, store, and process it, while finding it harder and harder to gain timely insights from it. As leaders and practitioners in cybersecurity, data engineering, and data infrastructure, we saw this pattern everywhere: spiraling SIEM costs, tool sprawl, noisy data, tech debt, brittle pipelines, and AI initiatives blocked by legacy systems and architectures.

We founded DataBahn to break this cycle. Our platform is specifically designed to help enterprises regain control: disconnecting data pipelines from outdated tools, applying AI to automate data engineering, and constructing systems that empower security, data, and IT teams. We believe data infrastructure should be dynamic, resilient, and scalable, and we are creating systems that leverage these core principles to enhance efficiency, insight, and reliability.

Today, we’re announcing a significant milestone in this journey: a $17M Series A funding round led by Forgepoint Capital, with participation from S3 Ventures and returning investor GTM Capital. Since coming out of stealth, our trajectory has been remarkable – we’ve secured a Fortune 10 customer and have already helped several Fortune 500 and Global 200 companies cut over 50% of their telemetry processing costs and automate most of their data engineering workloads. We're excited by this opportunity to partner with these incredible customers and investors to reimagine how telemetry data is managed.

Tackling an industry problem

As operators, consultants, and builders, we worked with and interacted with CISOs across continents who complained about how they had gone from managing gigabytes of data every month to being drowned by terabytes of data daily, while using the same pipelines as before. Layers and levels of complexity were added by proprietary formats, growing disparity in sources and devices, and an evolving threat landscape. With the advent of Generative AI, CISOs and CIOs found themselves facing an incredible opportunity wrapped in an existential threat, and without the right tools to prepare for it.

DataBahn is setting a new benchmark for how modern enterprises and their CISO/CIOs can manage and operationalize their telemetry across security, observability, and IOT/OT systems and AI ecosystems. Built on a revolutionary AI-driven architecture, DataBahn parses, enriches, and suppresses noise at scale, all while minimizing egress costs. This is the approach our current customers are excited about, because it addresses key pain points they have been unable to solve with other solutions.

Our two new Agentic AI products are solving problems for enterprise data engineering and analytics teams. Cruz automates complex data engineering tasks from log discovery, pipeline creation, tracking and maintaining telemetry health, to providing insights on data quality. Reef surfaces context-aware and enriched insights from streaming telemetry data, turning hours of complex querying across silos into seconds of natural-language queries.

The Right People

We’re incredibly grateful to our early customers; their trust, feedback, and high expectations have shaped who we are. Their belief drives us every day to deliver meaningful outcomes. We’re not just solving problems with them, we’re building long-term partnerships to help enterprise security and IT teams take control of their data, and design systems that are flexible, resilient, and built to last. There’s more to do, and we’re excited to keep building together.

We’re also deeply thankful for the guidance and belief of our advisors, and now our investors. Their support has not only helped us get here but also sharpened our understanding of the opportunity ahead. Ernie, Aaron, and Saqib’s support has made this moment more meaningful than the funding; it’s the shared conviction that the way enterprises manage and use data must fundamentally change. Their backing gives us the momentum tomove faster, and the guidance to keep building towards that mission.

Above all, we want to thank our team. Your passion, resilience, and belief in what we’re building together are what got us here. Every challenge you’ve tackled, every idea you’ve contributed, every late night and early morning has laid the foundation for what we have done so far and for what comes next. We’re excited about this next chapter and are grateful to have been on this journey with all of you.

The Next Chapter

The complexity of enterprise data management is growing exponentially. But we believe that with the right foundation, enterprises can turn that complexity into clarity, efficiency, and competitive advantage.

If you’re facing challenges with your security or observability data, and you’re ready to make your data work smarter for AI, we’d love to show you what DataBahn can do. Request a demo and see how we can help.

Onwards and upwards!

Nanda and Nithya
Cofounders, DataBahn