Big Data Analysis vs. Big Picture Analysis

I’m sure you have heard the buzzwords around Big Data and Big Data itself. Companies and governments are gathering lots and lots of data about lots of things. Big Data analysis is trying to make sense of this mountain of data and let people make intelligent decisions. It is this Big Data analysis itself I have a problem with. To be more specific, the problem I see is when people are trying to do Big Data analysis without seeing the big picture first. I guess I would call this a “Big Picture Analysis” when you do have all the data at hand but also the reasons “why” you have so much data in the first place.

Let me explain.

Say you have a system or maybe even many computer systems that generate data that you want to analyze at some point in the future.  You may or may not know how you want to analyze all this stuff but you do know that it might come in handy, one day. So, you store a ton of information. By that, I mean the system stores a ton of information into database, log files, etc. Most of the time, you don’t let the system delete anything. You just let the system gather more and more information because storage space is cheap in AWS.

Let’s assume you decided to look inside this mountain of data because you, or the business rather, can take advantage of this data to learn more about your customers and hopefully sell more products and/or services this way.

When you look inside your mountain of data using the latest Big Data analysis tools, you discover certain facts and statistics. You gather, you sum up things, and divide, you formulate, you massage the data, and so on. At some point, you will need to put these analysis results into some form of presentation that can be further used to make decisions. This can be reports, dashboards, etc.

Now, here comes the crux. With all this mountain of data, how can you be certain why you have all this data in the first place? I mean, why did your system(s) store all this data? Obviously, it stored all this data because it was designed to store all this data into databases etc. But, i’m trying to get you to see this from a business point of view. If you have modeled your system based on the domain, then you should be somewhat familiar with the data that was stored in the database. When you have a domain expert look at some of the data, that person might see certain indicators of what this data is about. Or, that person might have no clue even though that person is a business expert, a domain expert.

My point is that when you look at Big Data you should also look at the reasons why this Big Data exists. Only then, you can make a full connection and see the “Big Picture”. When you see the cause for the Big Data to exist, you can make better assumptions and conclusions after you have completed the analysis. You will be able to follow the “thread” from start to end. When you create a report after you have completed your Big Data analysis, you should also see the causes side by side on that report. Only then, you can see the Big Picture.

So, how do you do that? If you are a big fan of domain driven design (DDD), then you are almost there. When you model a domain you also model domain events (most of the time). Domain events reflect a significant event that has happened inside your domain. The past tense is important here. Things have happened already. Domain events capture these events and let you store that these significant domain events have happened. This is when things get very exciting. Imagine what you can do here. Your domain model not only operates on the business domain but also allows you to record of anything interesting that was triggered for business reasons. When you take a look at your recorded domain events at certain dates and times, you can connect your mountain of data and the reasons why this mountain of data was born. The domain events are the reasons why you have so much data. Your reports can reflect and show this connection between domain events and stored data.

At the end, your analysis just received a significant confirmation and validation of having more accurate information. This leads to an even better understanding of the data and ultimately making smarter decisions for those who need this information.

Updated C# Reference Implementation

I have updated my C# reference implementation and included FluentValidation on some of the DTO objects. I also updated the ErrorMap to include validations on the server side as well as on the WPF client side. This version also includes a sample SQL Server Persistence Provider. As always, you can get the latest code on my GitHub repo.

Object Persistence Reference Implementation

I’ve been updating my reference implementation in the last few days. I’m actually using this reference implement in my own projects. You can download the latest version on my GitHub repo.

This is a complete .NET C# reference implementation to help you jump start a service oriented system running in a cloud environment such as Amazon’s EC2 or on-premis clusters.

This reference implementation shows you how to build a client and the server side. The client side is a sample WPF application that communicates via http REST requests using JSON payloads to the service side. Of course, you can use any type of client as long as the client can communicate via http and REST based JSON’s.

The service side is using a Web API 2 service layer that communicates to a central domain model. The service side demonstrates how to handle exceptions and edge cases and how to communicate failure to the client.

The persistence layer demonstrates the extreamly powerful provider pattern to store the domain objects into the following databases:

  1. db4o (an object database)
  2. Redis (a NoSQL database)
  3. SimpleDB (a NoSQL database)
  4. SQL Server (comming soon)

Please note that the entire system has no knowledge on how the objects are stored. All implementation details are in the individual providers listed above. This means that you can switch the persistence provider without having to recompile and therefore switch a running system from one persistence store to another.

I will try to create a sample SQL Server provider soon.

appsworld North America 2015 at Moscone Center West, San Francisco

appsworldlogo

I’m a confirmed speaker at the appsworld North America 2015 at Moscone Center West, May 12-13 in San Francisco, CA.  Discover the future of multi-platform apps. See all confirmed speakers. This sure will be an exciting event. I’m still working on my presentation that will include Redis, Amazon AWS, C#, and more. I will see you there.

Presenting “Object Persistence in C#” in Sacramento, CA, March 25th, 2015

Wednesday, March 25th, 20015, I will be presenting “Object Persistence in C#” at the Sacramento .NET User Group (SAC.NET) at the Microsoft Office at 1415 L Street, Suite 200, Sacramento, CA 95814 starting at 6:00 pm. Maria Martinez, Co-Organizer, Sacramento .NET User Group, was kind enough in helping to get this organized. Thank you Maria. I will see you there.

Object Persistence, Part 3 – Source Code

In part 2 of my Object Persistence series, I’ve touched on the issues that still exist today.

In part 3, I’ve published a complete sample Visual Studio 2012 solution on GitHub that demonstrates object persistence using a db4o persistence provider. Over time, I will add additional sample persistence providers for Redis, SQL Server, and possibly a NoSQL provider such as SimpleDB (one of Amazon’s great NoSQL databases).

This sample solution includes complete server side and client side layers. The Server side runs as a REST based Web API 2 service. The server portion also includes a simple domain model and, of course, the persistence provider and how it is implemented. I will update the solution over time, expand the domain model, UI, etc. as required.

The client side is a WPF application that consumes the REST service. The payload to and from the REST service is via JSON objects.

ObjectPersistencePart3_WPF

I hope you like it. You can use this sample solution as a template to start simple or very complex software solutions. This solution can easily be taken and split across different nodes in a cluster of Amazon AWS EC2 instances, for example. However, for a cloud based solution, your persistence would have to support certain features. I will go into details when I add the Redis persistence provider.

Instead of writing a very long blog post, I will post a screen cast video and go through the solution. I think this will make more sense and you have a chance to go through the source code with me. So, go ahead and get the latest version from GitHub and start playing with it.

The HTML5 Hype vs. Native Application Performance (Again)

The latest buzz is all about HTML 5 and how it will improve our lives as architects, developers, companies, consumers of HTML5 applications and whoever else is getting sucked into this.

I’ve been doing professional software development for over 21 years now.  Over those years, I’ve used many different technologies. HTML 5 is just another “technology”. First of all, let me clarify in simple terms: HTML 5 is NOT a new technology. HTML 5 is based on existing technology called JavaScript, CSS 3 and HTML. None of these are new. What is new is the way it is being presented and marketed.

You see, I take the stand of delivering the best user experience to the consumer. What I mean by best user experience is this:
1.    Your users should “love” to use your software
2.    Your users should feel connected to your software
3.    Your software should accomplish what it said it would do for them, all the time
4.    Your software should do all things extremely fast
5.    Your software should look and feel top notch and professional, nothing less

As you can see from the list above, there are emotional connections between great software and their users. When you have human feelings involved with something as technical as software, you have subjective views on what great software means depending on who you ask. In addition, since most software is created for consumers, you will have to deal with emotional connections of your software and the consumers. You cannot afford to ignore your consumers’ feelings and perceptions of your software.

When you look at the industrial industry, physical objects such as a piece of furniture, a toaster oven, a recording device, etc. have aesthetics to them that triggers an emotional connection between the consumer and the physical object. Some consumers like a particular furniture and some others don’t. But, what is common between the different types of consumers is that they have a emotional connection or not. They like or don’t like a certain type of couch, for example. The design efforts that went into creating a successful piece of furniture maybe based on many factors.  Great designs are aesthetically pleasing and functional at the same time.

For example, look at the success of the iPhone or iPad. Here are devices that not necessarily have brand new technology inside them; but, they are packaged and presented in a way that is aesthetically pleasing to a lot of consumers. Moreover, they perform really fast and do it well all the time (disclaimer: nothing is perfect).  They both feel snappy. They both are easy to use and the content layouts (the user interface) are well thought out. They both are a total hit for Apple.

What would the iPhone or iPad be like if it had an Android or a Windows user interface? Would it still be a hit? Would consumers still love them? Would they buy them? Even worse, what if, both, the iPhone and iPad only had a Web Browser interface? How would consumers have reacted with their purchasing power?

What the iPhone and iPad have both in common is “fast” user feedback. This fast user feedback is, for the most part, is accomplished with native applications that take advantage of the device’s hardware. By native applications, I mean applications that are compiled into machine code for ultimate performance. In case of iPhone and iPad applications, this means that these applications have been developed with Objective-C on the Cocoa framework for iOS.

I’m certain that Steve Jobs had his hands in the user interface design decisions. I’m also certain that he would NEVER accept slow performing applications. If this is the case, I agree with Steve Jobs 100%. I agree because I believe that performance of software is even more important today than it was 20 years ago.

Software is created for people, most of the time. Consumers are spoiled with instance gratification. Consumers expect software to be fast. Consumers expect to get results, fast! People have less time and are doing more at the same time. Slow performing software is bad quality software, period. As a software creator, why would I want to settle for bad quality software? If you create the best software and want to make a living at the same time, you must create software that allows consumers to feel that emotional connection.

With that being said, when I hear things like HTML 5, I see Deja vu. I have seen this with Java, Visual Basic, .Net, ActiveX, Silverlight, etc.
These “new” technologies were created for many reasons and not one of them was created with the consumers in mind. These technologies such as HTML 5 were created with the intention to make things easier for the developers and the companies these developers work for.

The more abstract these technologies are, the more layers there are between the device and the output or user interfaces. The more layers there are, the slower the performance of the software. This principle has not changed in decades no matter how fast the hardware becomes.

At the end, the consumer can clearly see a difference in natively compiled software that was created for that native operating system and software that has these layers of translations that made it so simple for the developer.

This convenience for the developers costs dearly for the company who is selling the software in the long run. If a company truly cares for its target audience, then they better make sure that the emotional connection between the software and the consumer exists. The best way to do this is not to go the easy and lazy route but go the extra mile and develop the software in a computer programming language that has a native compiler. I can think of C++ or Delphi, for example.

Update 2012-04-09: Betting $1 Billion On Instagram, Facebook Backs Away From HTML5
http://www.sfgate.com/cgi-bin/article.cgi?f=/g/a/2012/04/09/businessinsiderbetting-1-billion-on.DTL