Confluent Cements Databricks For Real-Time Data AI-Ready Ecosystem

An employee drives a cement mixer through the main tunnel at the Thames Tideway Tunnel super sewer … [+]
Data is mixed. Inside modern enterprise IT stacks, it is quite standard to find data streams, data flows, data repositories and data connection channels that exist across various formats, platforms and storage latencies. Add that discombobulated variety to the fact that some data is tightly structured (financial statements), some is semi-structured (text files based upon forms-based documentation) and some is essentially unstructured (video or audio files with little or no “meta-tagging” to detail their contents) and you can see how we end up with quite a mix in the modern enterprise information stack.
One of the major hurdles that organizations experience when they try and work with data at this level is the need to overcome the divide between operational and analytical systems that operate in separate silos. This is made even more difficult because these data resources are typically managed by different teams and comprised of different technologies and workflows. Trying to build AI services on top of a moving and changeable surface of this kind is even tougher.
Real-time AI-ready Data
In a partnership effort designed to tackle a number of the key challenges presented here, data streaming platform company Confluent is expanding its partnership with data and AI technology company Databricks.
The companies are now coalescing Confluent’s Data Streaming Platform and Databricks’ Data Intelligence Platform to create what they are calling a real-time, AI-ready data ecosystem. Specifically, this is an integration between Confluent’s Tableflow (described as push-button technology designed to take Apache Kafka streaming data and feed it directly into a data lake, data warehouse or analytics engine as an Apache Iceberg table) and Databricks’ Unity Catalog (a governance tool for data and AI assets) for modern AI use cases, which obviously these days feature a good deal of agentic and generative AI.
All of this is designed to address the challenge of real world operational data inside organizations, which is frequently dumped into data lakes (or other storage repositories) without context and governance, making real-time data analysis challenging. The company says that when software application developers and data engineering professionals lack contextualized and trustworthy data, driving meaningful AI innovation becomes impossible.
Bidirectional Format-Agnostic Freedom
This integration work has been engineered to offer real-time interoperability via what is being called a “bidirectional integration” between Confluent’s Tableflow with Delta Lake and Databricks’ Unity Catalog. Delta Lake was originally pioneered by Databricks and was originally developed for streaming use cases with fast “write” ability (to get data written to its intended source fast).
As the Linux Foundation reminds us, Delta Lake is an open source storage framework that enables a format-agnostic data lakehouse architecture with compute engines including Snowflake, Google BigQuery, Redshift, Databricks and Azure Fabric among others. It has application programming interfaces for languages including Scala, Java, Rust and Python. According to Databricks, Delta Lake has become a widely adopted data lakehouse format because it has proven its worth when deployed at a massive scale, processing over 10 exabytes of data daily.
“For companies to maximize returns on their AI investments, they need their data, AI, analytics and governance all in one place,” said Ali Ghodsi, co-founder and CEO of Databricks. “As we help more organizations build data intelligence, trusted enterprise data sits at the center. We are excited that Confluent has embraced Unity Catalog and Delta Lake as its open governance and storage solutions of choice and we look forward to working together to deliver long-term value for our customers.”
Confluent’s Tableflow with Databricks-founded Delta Lake makes operational data available immediately to Delta Lake’s ecosystem and joint customers of both companies will be able to bring any engine or AI tool such as Apache Spark, Trino, Polars, DuckDB and Daft to their data in Unity Catalog.
Sophisticated AI-applications
“Real-time data is the fuel for AI,” said Jay Kreps, CEO and co-founder of Confluent. “But too often, enterprises are held back by disconnected systems that fail to deliver the data they need, in the format they need, at the moment they need it. Together with Databricks, we’re ensuring businesses can harness the power of real-time data to build sophisticated AI-driven applications for their most critical use cases.”
Both Kreps and Ghodsi point to further custom integrations between Tableflow and Databricks’ Unity Catalog that will ensure metadata is automatically applied to data exchanged between platforms. The CEOs say that this makes operational data discoverable and actionable for data scientists and analysts working in Databricks while ensuring analytical data is equally accessible and useful for application developers and streaming engineers in Confluent.
Additionally, Confluent’s Stream Governance suite will provide upstream governance and metadata to enhance fine-grained governance, end-to-end stream lineage, and automated data quality monitoring in Unity Catalog.
The Rise Of ‘First-Class’ Technology
What’s happening here is a topic in and of itself i.e. it’s the move to make one technology, one data type, one application, one technology platform or even one software development methodology a “first-class citizen” inside the realms of another technology. In this case, operational data from Confluent becomes a first-class citizen in Databricks, which means it is essentially and inherently integrated for speed of access, ease of use, width of scope and robustness of security (for which you can read governance and compliance as well) provisioning.
Equally in this case, Databricks data becomes more easily accessible by any computing processor (whether it is dedicated to data analytics, algorithmic logic for AI and so on) in the enterprise.
The companies underline their new amplified union by saying that the real-time streaming data topics that AI applications consume (and the tables that data analysts use) will now have consistent views of the same real-time data, enabling faster, smarter AI-driven decision-making across the organization. All of which is described as first-class integration alignment, which in the soon-to-be mission-critical world of AI decisioning (and perhaps life-critical too) kind of warrants first-tier business-class (airline seating pun intended) treatment.
Source link