DaggerDB is a new real time dimensional analytics engine built to be super fast and powerful without the complexity and steep learning curve of the mainstream OLAP platforms.
DaggerDB is an ACID compliant hybrid column store / OLAP aggregate database which enables both high speed realtime inserts of data and predictable real-time high performance dimensional querying using a simple fluent .NET query api.
var database = new Database(1, "Sales") .AddDimension("Product", "Product dimension") .AddDimension("Dealer", "Dealer dimension") .AddDimension("Year", "Year dimension") .AddDimension("Month", "Month dimension"); var cube = new Cube(1, "Sales", StorageModes.Persistent) .AddDimensions("Product", "Dealer", "Year", "Month") .AddFactValue("Quantity", FactValueTypes.Int) .AddFactValue("TotalAmount", FactValueTypes.Decimal) .AddMeasure("QuantitySum", MeasureTypes.Sum, "Quantity") .AddMeasure("TotalAmountSum", MeasureTypes.Sum, "TotalAmount"); database.AddCube(cube);Adding data to the cube in realtime (e.g. via a messaging queue or equivalent)
var transaction = new CubeTransaction(); transaction.AddItem( new DataMap() .AddMemberMap("Product", 1) .AddMemberMap("Dealer", 17) .AddMemberMap("Year", DateTime.Now.Year) .AddMemberMap("Month", DateTime.Now.Month) .AddValueMap("Quantity", 3) .AddValueMap("TotalAmount", 13000)); } cube.CommitTransaction(transaction);Querying the cube
var result = cube.Query(query => query .Select() .Dimension("Product").Members(1, 2, 3) .Across() .Dimension("Dealer").Members() .WithMeasures().Measure("QuantitySum").Measure("TotalAmountSum") .Where() .Dimension("Year").Member(DateTime.Now.Year) .And() .Dimension("Month").Member(DateTime.Now.Month) .ToQuery() );
The backbone of the Dagger database engine is a highly parallel columnar store database powered by versioning and bitmap indexing. All data is written to the column store in an immutable append-only manner to minimize locking issues and maximize parallelization on the query side.
The columnar store is very suitable for querying, grouping, slicing and dicing large amounts of analytical data.
The aggregate engine complements the columnar store by calculating aggregates of the underlying data asyncronously in real-time. Each aggregate is versioned and may cover the complete dataset for a specific tuple or only a part of the dataset up to a certain version. If a partial aggregate is hit by a query, that aggregate will be used as a starting point before retrieving and calculating the newer records not yet covered by the aggregate from the column store. This keeps the number of records retrieved from the columnar store low even when the dataset keeps on growing and gives fast and predictable query performance.
One important part of the aggregate engine is the insert and query statistics and optimizations. The insert statistics monitors in realtime which tuples are recieving most of the incoming data and based on that, which aggregates should be created to minimize query data retrievals.
The query statistics monitors what parts of queries resulted in most records being retrieved from the columnar store and can use that information as a basis for usage optimizations and can even use the data already queried and calculated from the query to create new covering aggregates.
In data sets with many dimensions the hybrid olap / column store architecture effectively handles problems with data explosion by prioritizing creation of aggregates where they will do most good. The data does not have to be completely covered by aggregates - the column store will serve queries not covered.
DaggerDB is an ACID compliant database. When a transaction has been committed you can be sure that it has been persisted to disk and that it will be there even after a crash or a power failure. DaggerDB uses unbuffered write-though to write the complete transaction data to disk followed by a checkpoint write to a transaction log. When starting up, DaggerDB searches for the last valid committed transaction, starting at the end of the transaction log. Transactions are in ReadCommited isolation mode which means that data in transactions will be visible to queries directly after it has been committed without any delay, while a transaction that is being committed when a query hits will not be visible to the query.
DaggerDB is a database specialized in analyzing data, its role in most services will therefore be as a secondary database next to the OLTP databases, batching data from them or adding data in realtime as it is inserted to the OLTP databases. As with most other OLAP solutions DaggerDB should never be responsible for owning the data, it should always be possible to rebuild Dagger cubes from the original datasource if you later need to change the data model of the cube.
When creating the data model of a cube it should always be as specialized as possible for the analytics that you want to perform - only containing the dimensions, members and values you need for analyzing the data. Try grouping the data as much as possible and avoid storing dimensions with members of very high granularity. E.g you should probably not have a dimension called "Order line" with one member for each order line. But maybe one dimension for product, one for sales channel and perhaps one for customer.
The optimal hardware setup is at least 8 Gb ram, 8 processor cores and SSD disks.
Let's do a couple of tests to give you a hint of the performance characteristics of DaggerDB. The query response time measures shown below are the ones that DaggerDB returns in the query result set and are server side measures, measuring the time from building the query, retrieving aggregates and data, ev. data aggregation and building the result set. Serialization / deserialization, network traffic and client execution time is not included in those measures. All tests are run on an intel i8 with a SSD disk with all settings for caching, page sizes etc set to the defaults.
We're starting by using the DaggerDB server with a persistent cube containing 3 dimensions, 1 value and 1 measure. This test starts with an emty cube and runs until it contains 25 million rows at the same time constantly querying the cube with the query:
Select(). Dimension(1). Members(1, 2, 3, 4, 5). Across(). Dimension(2). Members(2012, 2013, 2014). Where(). Dimension(3). Member(3)
The server and the applications inserting and querying the cube are in this test all running on the same computer. The inserter is running at approx 45.000 inserts per second, and the program querying is set to execute about 100 queries per second.
The insert speed is as the chart shows constant as the number of records in the cube increases. The query response time is also about constant as the dataset grows and averages at around 1 ms per query in this test.
Constantly running queries as in this tests helps keeping the aggregates very up to date, thereby keeping the data needed to be retrieved from the columnar store low. In this test we instead run the inserts (25.000.000) separatly on their own and not until they complete we run a series of queries of various complexity:
|Query||1:st Response time||2:nd Response time|
|Select(). Dimension("1"). Members(9)||< 1 ms||< 1 ms|
|Select(). Dimension("1"). Members(1, 2, 3, 4, 5, 6, 7, 8, 9)||< 1 ms||< 1 ms|
|Select(). Dimension("2"). Members(2012, 2013, 2014, 2015).Across() .Dimension("3"). Members(1, 2, 3, 4, 5, 6, 7, 8, 9)||14 ms||< 1 ms|
|Select(). Dimension("2"). Members(2012, 2013, 2014, 2015). Across() .Dimension("3"). Members(1, 2, 3, 4, 5, 6, 7, 8, 9). Where(). Dimension("1"). Member(2)||3 ms||< 1 ms|
|Select(). Dimension("3"). Members(1, 2, 3, 4, 5, 6, 7, 8, 9)||1 ms||< 1 ms|
|Select(). Dimension("1"). Members(1, 2, 3, 4, 5, 6, 7). Where() .Dimension("2"). Member(2012). And(). Dimension("3"). Member(7)||< 1 ms||< 1 ms|
Use the DaggerDB Server Web-Admin to easily create databases and cubes without having to write code. You can also view statistics for each cube on the server, updated in real-time.
The Web-admin is a thin .NET MVC application that you host in your IIS Web-Server and access from your web browser.
Using the web admin you can also write queries directly against the cubes and view the results in tables combined with statistics for the queries executed.