I’m sure you have heard the buzzwords around Big Data and Big Data itself. Companies and governments are gathering lots and lots of data about lots of things. Big Data analysis is trying to make sense of this mountain of data and let people make intelligent decisions. It is this Big Data analysis itself I have a problem with. To be more specific, the problem I see is when people are trying to do Big Data analysis without seeing the big picture first. I guess I would call this a “Big Picture Analysis” when you do have all the data at hand but also the reasons “why” you have so much data in the first place.
Let me explain.
Say you have a system or maybe even many computer systems that generate data that you want to analyze at some point in the future. You may or may not know how you want to analyze all this stuff but you do know that it might come in handy, one day. So, you store a ton of information. By that, I mean the system stores a ton of information into database, log files, etc. Most of the time, you don’t let the system delete anything. You just let the system gather more and more information because storage space is cheap in AWS.
Let’s assume you decided to look inside this mountain of data because you, or the business rather, can take advantage of this data to learn more about your customers and hopefully sell more products and/or services this way.
When you look inside your mountain of data using the latest Big Data analysis tools, you discover certain facts and statistics. You gather, you sum up things, and divide, you formulate, you massage the data, and so on. At some point, you will need to put these analysis results into some form of presentation that can be further used to make decisions. This can be reports, dashboards, etc.
Now, here comes the crux. With all this mountain of data, how can you be certain why you have all this data in the first place? I mean, why did your system(s) store all this data? Obviously, it stored all this data because it was designed to store all this data into databases etc. But, i’m trying to get you to see this from a business point of view. If you have modeled your system based on the domain, then you should be somewhat familiar with the data that was stored in the database. When you have a domain expert look at some of the data, that person might see certain indicators of what this data is about. Or, that person might have no clue even though that person is a business expert, a domain expert.
My point is that when you look at Big Data you should also look at the reasons why this Big Data exists. Only then, you can make a full connection and see the “Big Picture”. When you see the cause for the Big Data to exist, you can make better assumptions and conclusions after you have completed the analysis. You will be able to follow the “thread” from start to end. When you create a report after you have completed your Big Data analysis, you should also see the causes side by side on that report. Only then, you can see the Big Picture.
So, how do you do that? If you are a big fan of domain driven design (DDD), then you are almost there. When you model a domain you also model domain events (most of the time). Domain events reflect a significant event that has happened inside your domain. The past tense is important here. Things have happened already. Domain events capture these events and let you store that these significant domain events have happened. This is when things get very exciting. Imagine what you can do here. Your domain model not only operates on the business domain but also allows you to record of anything interesting that was triggered for business reasons. When you take a look at your recorded domain events at certain dates and times, you can connect your mountain of data and the reasons why this mountain of data was born. The domain events are the reasons why you have so much data. Your reports can reflect and show this connection between domain events and stored data.
At the end, your analysis just received a significant confirmation and validation of having more accurate information. This leads to an even better understanding of the data and ultimately making smarter decisions for those who need this information.