Elastic Map Reduce on AWS

derived dataLast week, I put out a post about Redshift on AWS as an effective tool to quickly and dynamically put your toe in a large data warehouse environment.

Another tool from AWS that I experimented with was Amazon’s Elastic Map Reduce (EMR). This is an open source Hadoop installation that supports MapReduce as well as a number of other highly parallel computing approaches. EMR also supports a large number of tools to help with implementation (keeping the environment fresh) such as:  PigApache HiveHBase, Spark, Presto… It also interacts with data from a range of AWS data stores like: Amazon S3 and DynamoDB.

EMR supports a strong security model, enabling encryption at rest as well as on the move and is available in GovCloud, handling a range of big data use cases, including log analysis, web indexing, data transformations (ETL), machine learning, financial analysis, scientific simulation, and bioinformatics.

For many organizations, a Hadoop cluster has been a bridge to far for a range of reasons including support and infrastructure costs and skills. EMR seems to have effectively addressed those concerns allowing you to set up or tear down the cluster in minutes, without having to worry much about the details of node provisioning, cluster setup, Hadoop configuration, or cluster tuning.

For my proof of concept efforts, the Amazon EMR pricing appeared to be simple and predictable allowing you to pay a per-second rate for the clusters installation and use — with a one-minute minimum charge (it used to be an hour!). You can launch a 10-node Hadoop cluster for less than a dollar an hour (naturally, data transport charges are handled separately). There are ways to keep your EMR costs down though.

The EMR approach appears to be focused on flexibility, allowing complete control over your cluster. You have root access to every instance and can install additional applications and customize the cluster with bootstrap actions (which can be important since it takes a few minutes to get a cluster up and running), taking time and personnel out of repetitive tasks.

There is a wide range of tutorials and training available as well as tools to help estimate billing.

Overall, I’d say that if an organization is interested in experimenting with Hadoop, this is a great way to dive in without getting soaked.


AWS Redshift and analytics?

data insightRecently, I had the opportunity to test out Amazon Redshift. This is a fast, flexible, fully managed, petabyte-scale data warehouse solution that makes it simple to cost effectively analyze data using your existing business intelligence tools. It’s been around for a while and matured significantly over the years.

In my case, I brought up numerous configurations of multi-node clusters in a few minutes, loaded up a fairly large amount of data, did some analytics and brought the whole environment down – at a cost of less than a dollar for the short time I needed it.

There are some great tutorials available and since Amazon will give you an experimentation account to get your feet wet. You should be able to prove out the capabilities to yourself without costing you anything.

The security of the data is paramount to the service, since it is available in public AWS as well as GovCloud and can be configured to be HIPAA or ITAR compliant… Data can be compressed and encrypted before it ever makes it to AWS S3.

You can use the analytic tools provided by Amazon or use security groups to access your data warehouse using the same tools you would use on-site. During my testing, I loaded up both a large star schema database as well as some more traditionalize normalized structures.

Since this is only a blog post, I can’t really go into much detail and the tutorials/videos are sufficient to bootstrap the learning process. The purpose of this post is to inform those who have data warehouse needs but not the available infrastructure that there is an alternative worth investigating.

Six thoughts on mobility trends for 2018

mobility walkLet’s face it, some aspects of mobility are getting long in the tooth. The demand for more capabilities is insatiable. Here are a few areas where I think 2018 will see some exciting capabilities develop. Many of these are not new, but their interactions and intersection should provide some interesting results and thoughts to include during your planning.

1. Further blurring and integration of IoT and mobile

We’re likely to see more situations where mobile recognizes the IoT devices around them to enhance contextual understanding for the user. We’ve seen some use of NFC and Bluetooth to share information, but approaches to embrace the environment and act upon the information available is still in its infancy. This year should provide some significant use cases and maturity.

2. Cloud Integration

By now most businesses have done much more than just stick their toe in the cloud Everything as a Service (XaaS) pool. As the number of potential devices in the mobility and IoT space expand, the flexibility and time to action that cloud solutions facilitate needs to be understood and put into practice. It is also time to take all the data coming in from these and transform that flow into true contextual understanding and action, also requiring a dynamic computing environment.

3. Augmented reality

With augmented reality predicted to expend to a market somewhere between $120 and $221 billion in revenues by 2021, we’re likely to see quite a bit of innovation in this space. The wide range of potential demonstrates the lack of a real understanding. 2018 should be a year where AR gets real.

4. Security

All discussions of mobility need to include security. Heck, the first month of 2018 has should have nailed the importance of security into the minds of anyone in the IT space. There were more patches (and patches of patches) on a greater range of systems than many would have believed possible just a short time ago. Recently, every mobile store (Apple, Android…) was found to have nefarious software that had to be exercised. Mobile developers need to be ever more vigilant, not just about the code they write but the libraries they use.

5. Predictive Analytics

Context is king and the use of analytics to increase the understanding of the situation and possible responses is going to continue to expand. As capabilities advance, only our imagination will hold this area back from increasing where and when mobile devices become useful. Unfortunately, the same can be said about the security issues that are based on using predictive analytics.

6. Changing business models

Peer to peer solutions continue to be the rage but with the capabilities listed above, whole new approaches to value generation are possible. There will always be early adopters who are willing to play with these and with the deeper understanding possibilities today new approaches to crossing the chasm will be demonstrated.

It should be an interesting year…


Back in Seattle

Last week, I was able to go back on the Microsoft campus in Redmond for a meeting. That’s the first time I’ve been back there since I spent 3 months there as part of the EDS Top Gun program back in 2005.

Flying into Seattle, we got a good view of the Space Needle and the Science Fiction Museum and Hall of Fame.seattle

There were a number of déjà vu moments walking around the Microsoft campus.


I always find these opportunities to see what companies are most proud of very telling. It was clear that cloud, analytics and human interface transformations were in the forefront of their thinking — much like the rest of us.


Cloud Architect’s Song

binary singingI was looking for an old post and noticed that HP had taken down all the old TNBT posts.

One that always made me chuckle I’d put together a number of years back was the Cloud Architect’s song. I couldn’t find the exact version I had posted but I did my best to recreate it here.

The Cloud Architect’s Song (sung to the tune of I Will Survive)

At first I was afraid, I was petrified,
Kept thinking I could not instantiate
what you had specified.

But then I’d spent too many nights
re-hosting what you’d just built wrong,
and I grew strong,
I learned how to get along.

And now you’re back, wanting more cloud space,
I just walked in to find you here
With that need for more disk space,

I should have changed that stupid plan,
I should have made you pay that fee,
If I had known for just one second
you’d be back to bother me,

go now go, we don’t use core,
move objects around now,
‘cause you don’t wanna pay for it anymore,
Weren’t you the one who tried to break me with your burst,

you think I’d crumble,
you think I’d lay down and die?
Oh no not I, I will provide….

Long as I know how to shove
I think I’ll use EC2
I’ve got all my script to fire
And all my SaaS to give and I’ll survive
I will survive


Measuring the value and impact of cloud probably hasn’t changed that much over the years but…

cloud question markI was in a discussion today with a number of technologists when someone asked “How should we measure the effectiveness of cloud?” One individual brought up a recent post they’d done titled: 8 Simple Metrics to Track Your Cloud SuccessIt was good but a bit too IT centric for me.

That made me look up a post I wrote on cloud adoption back in 2009. I was pleased that my post held up so well, since the area of cloud has changed significantly over the years. What do you think? At that time I was really interested in the concept of leading and lagging indicators and that you really needed to have both perspectives as part of your metrics strategy to really know how process was being made.

Looking at this metrics issue made me think “What has changed?” and “How should we think about (and measure) cloud capabilities differently?”

One area that I didn’t think about back then was security. Cloud has enabled some significant innovation on both the positive and the negative sides of security. We were fairly naive about security issues back then and most organizations have much greater mind-share applied to security and privacy issues today – I hope!

Our discussion did make me wonder about what will replace cloud in our future or will we just rename some foundational element of it – timesharing anyone?

One thing I hope everyone agrees to though is: it is not IT that declares success or defines the value, it remains the business.


Not your father’s SAP

This week’s SapphireNow was eye opening for me. My interactions with SAP were primarily from implementing BW in its early days (1999 V1.2B) and being the CT for the EDS side of relationships with large outsourcing arrangements that used SAP R3.

It was clear just walking around the SAP area that things have changed significantly. There were no SAP GUI screens visible, everything had a clean modern look. The UI customization demos were both easier to perform and actually possible for the end user with little customization. Granted they were not doing anything too complex.

Integration options seemed to be more intuitive and actually possible for a range of other systems, supporting bi-directional information flow.

Even the executive dashboard (sorry for the reflection in the picture but I took it myself) seemed to be something an executive could actually use with relatively minor training. I’ve always been fascinated by executive dashboards! The person I talked with said it is even relatively easy to extend the display using HTML 5 techniques.

SAP executive dashboard

I am sure there is still quite a bit of work ahead for SAP to get all the functionality (especially industry) over and running at maximum efficiency to S4 HANA, but what was shown was impressive. Likely the first thing any organization contemplating the move needs to do is triage their customizations and extensions. The underlying data structures for S4 HANA are much less redundant, since the in-memory model removes the need for the redundancy to hit performance. The functionality also seems more versatile, so hopefully many of the customizations that organizations ‘just had to have’ can be eliminated.

I’ve always said the first rule of buying 3rd party packages is: “don’t do anything that prevents you from taking the next release”. With the new approach by SAP those running S4 HANA on the cloud will be getting the next release on a continuous basis. Those with an on premise approach will be getting it every nine months (or so).  So the option of putting of releases is becoming less viable.

I’ll get a post on Diginomica next week with more of an enterprise architect’s perspective.