Epsio is a streaming SQL engine that allows you to create incremental views within your database. To do this, Epsio consumes a replication stream from the customer’s database, and sinks back the results into that same DB.
In comparison with other streaming engines such as Flink (or most - perhaps any- corresponding streaming engines), Epsio's first aim is to have absolutely seamless integration with your current database; this means no need to setup any external components (such as Debezium/Kafka) aside from Epsio itself, and no new API outside of your database.
Our other main focus outside of integration was of course, world-class speed. This article showcases that speed in relation to the highly used open-source Flink; another stream-processing engine.
TL;DR
Epsio goes from being 7x all the way to 21x the speed of Flink, while maintaining absolute consistency and correctness (in contrast to Flink and many other streaming engines).
To benchmark ourselves we run internally designed aggregations against the TPC-DS dataset. The TPC-DS benchmark simulates a retail business environment, offering a schema with multiple interrelated tables (e.g., sales, inventory, customers) and complex queries. We use internally designed queries in order to better simulate building “views” above the dataset; e.g. instead of running a query to select the total amount of sales for a specific date, we would have a query to give the total amount of sales per day. This means the queries we run to benchmark ourselves are much heavier on the “aggregatory” factor as opposed to searching specific data.
In this blog’s benchmark, we are going to run ourselves against Flink. In a “real-world” scenario, both Epsio and Flink would be consuming changes directly from Postgres; to ensure we’re benchmarking Epsio against Flink, rather than comparing Epsio’s CDC forwarder with Debezium (despite our significant advantage there :)), both benchmarks are designed to begin only after Debezium or the CDC forwarder have completed fetching all changes from the database. No additional “work” beyond fetching the rows is performed before the benchmark starts.
We’ll be running aggregative queries on TPC-DS Scale Factor 100 (approx. ~1 billion rows overall in the database, not all used for every query), and writing back the results to the original Postgres. Each will run on a GCP N2 instance with 16 vCPUs, 64 GB RAM, running Ubuntu 22.04 (`6.8.0-1018-gcp` kernel), and a 2TB disk.
Flink Setup
To setup Flink to consume from a Postgres DB, we used the recommended Debezium connector, which streamed into different Kafka topics.
Debezium, although an industry standard, has its fair share of issues, which is why Epsio doesn’t use it internally. For starters, almost any Streaming SQL engine based on Debezium is **inconsistent across tables,** since the engine will need to consume the tables separately. For most companies this is a non-starter, as you’re continuously showing incorrect results. Debezium is also pretty un-performant, and even with the Avro format which sped things up, we weren’t able to get Debezium to stream above 70–100k rows/s (depending on the rows). But since we’re not measuring Debezium’s speed, that’s fine- we’ll make sure Debezium finished loading all rows into Kafka before running our queries on Flink.
As for Flink itself, we tried copying most of Flink’s configuration from a Nexmark benchmark. Flink by default isn’t configured to leverage the machine it’s on fully, so we settled on creating one jobmanager and one taskmanager, with the taskmanager set to use 16 cpus with parallelism set to 16 (given we have 16 CPUs). This means every action in Flink will be parallelized to 16 CPUs, including sinking into the result DB. We also disabled incremental checkpoints to speed things up; this setup wouldn’t be crash safe.
The whole setup can be found here:
https://github.com/Epsio-Labs/public-benchmarks/tree/main/flink
Epsio Setup
Because Epsio is self contained, we didn't need to run any other components in the system like Debezium or Kafka. Epsio is run with a one liner:
and setup is then done to configure Postgres details.
All Epsio views (queries with "names" where the results will be stored) are created via the user's own database; no need to interface with another server/program!
To create views, we ran the following in the TPC-DS database (which notifies Epsio to create the view):
which populated the first result and then began streaming incoming CDC (which we preloaded in this scenario).
Aggregative Queries
We ran four different aggregative queries on Epsio & Flink.
Count Query
The first query is essentially a “simple aggregate” query, seeing how long it takes to do a simple count(*) while streaming in the largest table of the TPC-DS dataset: store_sales, with ~288 million rows. This query mainly tests the overhead of moving data through the system, as well as applying a very simple aggregation.
Performance Factor: 7.77x
Sales Aggregations Group By
The next query will give us different aggregations of sales, grouped by day and a host of other factors (manufacturer, brand). The end result has ~4.8m rows. This query has a mix of heavy joins and aggregates, with the GROUP BY outputting a relatively high amount of aggregations.
Performance Factor: 9.05x
Sales Aggregations Group By #2
The third query also centers on aggregating sales per group but is much more “aggregative”. Its output is condensed to 2,600 rows through aggregation, though nearly all rows (~95%) contribute to the final result.
Performance Factor: 21.44x
Sales Trends
The last query analyzes customer purchase behavior across multiple sales channels (store, catalog, and web) over two consecutive years, focusing on identifying customers whose catalog purchases show stronger growth trends compared to other channels. This query is fairly complex and heavy; it tests how well the streaming engine is able to handle many operators as well as moving large amounts of data through the different operators. We tried playing a bit with Flink to reduce the parallelism per operator, but found the default settings we set to still be best here.
Performance Factor: 10.49x
Conclusion
Epsio is significantly faster than Flink; up to 21x in this benchmark. In contrast to Flink and many other alternatives, which demand setting up other components (Debezium, AVRO registry), configurations (memory, parallelism, etc), and dealing with a new interface, Epsio also works right out of the box with the existing database you already have.
Furthermore, Epsio is always correct to a certain point in time while Flink is not, and supports arbitrarily complex SQL.