Things are not always what they seem – a discussion about analytics

evaluationHave you ever been in a discussion about a topic thinking you’re talking about one area but later find out it was about something else altogether?

We’ve probably all had that conversation with a child, where they say something like “That’s a really nice ice cream cone you have there.” Which sounds like a compliment on your dairy delight selection but in reality is a subtle way of saying “Can I have a bite?”

I was in a discussion with an organization about a need they had. They asked me a series of questions and I provided a quick stream of consciousness response…The further I got into the interaction the less I understood about what was going on. This is a summary of the interaction:

1) How do you keep up to speed in new data science technology? I read and write blogs on technical topics as well as read trade publications. I also do some recreational programming to keep up on trends and topics. On occasion I have audited classes on both EdX and Coursera (examples include gamification, Python, cloud management/deployment, R…)

2) Describe what success looks like in the context data science projects? Success related analytics efforts is the definition, understanding, development of insight on and the addressing of business goals using available data and business strategies. Sometimes this may only involve the development of better strategies and plans, but in other cases the creation of contextual understanding and actionable insight allows for continuous improvement of existing or newly developed processes.

3) Describe how do you measure the value of a successful data science application. I measure the value based on the business impact through the change in behavior or business results. It is not about increased insight but about actions taken.

4) Describe successful methods or techniques you have used to explain the value of data science, machine learning, advanced analytics to business people. I have demonstrated the impact of a gamification effort by using previously performed business process metrics and then the direct relationship with post implementation performance. Granted correlation does not prove causation but by having multiple instances of base cases and being able to validate performance improvement from a range a trials and processes improvements, a strong business case can be developed using a recursive process based on the definition of mechanics, measurement, behavior expectations, and rewards.

I’ve used a similar approach in the IoT space, where I’ve worked on and off with machine data collection and data analysis since entering the work force in the 1980s.

5) Describe the importance of model governance (model risk management) in the context of data science, advanced analytics, etc. in financial services. Without a solid governance model, you don’t have the controls and cannot develop the foundational level of understanding. The model should provide the rigor sufficient to move from supposition to knowledge. The organization needs to be careful not to have too rigid a process though, since you need to take advantage of any information learned along the way and make adjustment, to take latency out of the decision making/improvement process. Like most efforts today a flexible/agile approach should be applied.

6) Describe who did you (team, function, person) interact with in your current role, on average, and roughly what percent of time did you spend with each type of function/people/team. In various roles I spent time with CEO/COOs and senior technical decision makers in fortune 500 companies (when I was the chief technologist of Americas application development with HP: 70-80% of my time). Most recently when with Raytheon IT, I spend about 50% of my time with senior technical architects and 50% of my time with IT organization directors.

7) Describe how data science will evolve during the next 3 to 5 years. What will improve? What will change? Every organization should have in place a plan to leverage both improve machine learning and analytics algorithms based on the abundance of data, networking and intellectual property available. Cloud computing techniques will also provide an abundance of computing capabilities that can be brought to bear on the enterprise environment. For most organizations, small sprint project efforts need to be applied to both understanding the possibilities and the implications. Enterprise efforts will still take place but they will likely not have the short term impact that smaller, agile efforts will deliver. I wrote a blog post about this topic earlier this month. Both the scope and style of projects will likely need to change. It may also involve the use more contract labor to get the depth of experience in the short term to address the needs of the organization. The understanding and analysis of the meta-data (block chains, related processes, machines.…) will also play an ever increasing role, since they will supplement the depth and breadth of contextual understanding.

8) Describe how do you think about choosing technical design of data science solutions (what algorithms, techniques, etc.).

I view the approach to be similar to any other architectural technical design. You need to understand:

  • the vision (what is to be accomplished)
  • the current data and systems in place (current situation analysis)
  • understand the skills of the personnel involved (resource assessment)
  • define the measurement approach to be used (so that you have both a leading and lagging indicator of performance)

then you can develop a plan and implement your effort, validating and adjusting as you move along.

How do you measure the value/impact of your choice?

You need to have a measurement approach that is both tactical (progress against leading indicators) as well as strategic (validation by lagging indicators of accomplishment). Leading indicators look ahead to make sure you are on the right road, where lagging indicators look behind to validate where you’ve been.

9) Describe your experience explaining complex data to business users. What do you focus on?

The most important aspect of explaining complex data is to describe it in terms the audience will understand. No one cares how hard it was to do the analysis, they just want to know the business impact, value and how it can be applied.

Data visualization needs to take this into account and explain the data to the correct audience – not everyone consumes data using the same techniques. Some people will only respond to spreadsheets, while others would like to have nice graphics… Still others want business simulations and augmented reality techniques to be used whenever possible. If I were to have 3 rules related to explaining technical topics, they would be:

  1. Answer the question asked
  2. Display it in a way the audience will understand (use their terminology)
  3.  Use the right data

At the end of that exchange I wasn’t sure if I’d just provided some free consulting, went through a job interview or was just chewing the fat with another technologist. Thoughts???

Advertisements

Is AI a distraction???

AutomationI was recently in an exchange with a respected industry analyst where they stated that AI is not living up to its hype – they called AI ‘incremental’ and a ‘distraction’. This caught me a bit my surprise, since my view is that there are more capabilities and approaches available for AI practitioners than ever before. It may be the business and tech decision makers approach that is at fault.

It got me thinking about the differences in ‘small’ AI efforts vs. Enterprise AI efforts. Small AI are those innovative, quick efforts that can prove a point and deliver value and understanding in the near term. Big AI (and automation efforts) are those that are associated with ERP and other enterprise systems that take years to implement. These are likely the kinds of efforts that the analyst was involved with.

Many of the newer approaches enable the use of the abundance of capabilities available to mine the value out of the existing data that lies fallow in most organizations. These technologies can be tried out and applied in well defined, short sprints whose success criteria can be well-defined. If along the way, the answers were not quite what was expected, adjustments can be made, assumptions changed, and value can still be generated. The key is going into these projects with expectations but still flexible enough to change based on what is known rather than just supposition.

These approaches can be implemented across the range of business processes (e.g., budgeting, billing, support) as well as information sources (IoT, existing ERP or CRM). They can automate the mundane and free up high-value personnel to focus on generating even greater value and better service. Many times, these focused issues can be unique to an organization or industry and provide immediate return. This is not the generally not the focus of Enterprise IT solutions.

This may be the reason some senior IT leaders are disillusioned with the progress of AI in their enterprise. The smaller, high-value project’s contributions are round off error to their scope. They are looking for the big hit and by its very nature will be a compromise, if not a value to really move the ball in any definitive way – everyone who is deploying the same enterprise solution, will have access to the same tools…

My advice to those leaders disenchanted with the return from AI is to shift their focus. Get a small team out there experimenting with ‘the possible’. Give them clear problems (and expectations) but allow them the flexibility to bring in some new tools and approaches. Make them show progress but be flexible enough to understand that if their results point in a different direction, to shift expectations based on facts and results. There is the possibility of fundamentally different levels of costs and value generation.  

The keys are:

1)      Think about the large problems but act on those that can be validated and addressed quickly – invest in the small wins

2)      Have expectations that can be quantified and focus on value – Projects are not a ‘science fair’ or a strategic campaign just a part of the business

3)      Be flexible and adjust as insight is developed – just because you want the answer to be ‘yes’ doesn’t mean it will be, but any answer is valuable when compared to a guess

Sure, this approach may be ‘incremental’ (to start) but it should make up for that with momentum and results. If the approach is based on expectations, value generation and is done right, it should never be a ‘distraction’.

What’s the real outcome of Salesforce’s AI predictions?

automated decisionsYesterday. I was catching up on my technology email and came across this post stating that Salesforce now powers over 1B predictions every day for its customers. That’s a pretty interesting number to throw out there, but it makes me ask “so what?” How are people using these predictions to make greater business impact.

The Salesforce website states:

“Einstein is a layer of artificial intelligence that delivers predictions and recommendations based on your unique business processes and customer data. Use those insights to automate responses and actions, making your employees more productive, and your customers even happier. “

Another ‘nice’ statement. Digging into the material a bit more Einstein (the CRM AI functions from Salesforce) appears to provide analysis of previous deals and if a specific opportunity is likely to be successful, helping to prioritize your efforts. It improves the presentation of information with some insight into what it means. It appears to be integrated into the CRM system that the users are already familiar with.

For a tool that has been around since the fall of 2016, especially one that is based on analytics… I had difficulty finding any independent quantitative analysis of the impact. Salesforce did have a cheatsheet with some business impact analysis of the AI solution (and blog posts), but no real target market impact to provide greater context – who are these metrics based on.

It may be that I just don’t know where to look, but it does seem like a place for some deeper analysis and validation. The analysts could be waiting for other vendor’s solutions to compare against.

In the micro view, organizations that are going to dive into this pool will take a more quantitative approach, defining their past performance, expectations and validate actuals against predictions. That is the only way a business can justify the effort and improve. It is not sufficient to just put the capabilities out there and you’re done.

It goes back to the old adage:

“trust, but verify”

Elastic Map Reduce on AWS

derived dataLast week, I put out a post about Redshift on AWS as an effective tool to quickly and dynamically put your toe in a large data warehouse environment.

Another tool from AWS that I experimented with was Amazon’s Elastic Map Reduce (EMR). This is an open source Hadoop installation that supports MapReduce as well as a number of other highly parallel computing approaches. EMR also supports a large number of tools to help with implementation (keeping the environment fresh) such as:  PigApache HiveHBase, Spark, Presto… It also interacts with data from a range of AWS data stores like: Amazon S3 and DynamoDB.

EMR supports a strong security model, enabling encryption at rest as well as on the move and is available in GovCloud, handling a range of big data use cases, including log analysis, web indexing, data transformations (ETL), machine learning, financial analysis, scientific simulation, and bioinformatics.

For many organizations, a Hadoop cluster has been a bridge to far for a range of reasons including support and infrastructure costs and skills. EMR seems to have effectively addressed those concerns allowing you to set up or tear down the cluster in minutes, without having to worry much about the details of node provisioning, cluster setup, Hadoop configuration, or cluster tuning.

For my proof of concept efforts, the Amazon EMR pricing appeared to be simple and predictable allowing you to pay a per-second rate for the clusters installation and use — with a one-minute minimum charge (it used to be an hour!). You can launch a 10-node Hadoop cluster for less than a dollar an hour (naturally, data transport charges are handled separately). There are ways to keep your EMR costs down though.

The EMR approach appears to be focused on flexibility, allowing complete control over your cluster. You have root access to every instance and can install additional applications and customize the cluster with bootstrap actions (which can be important since it takes a few minutes to get a cluster up and running), taking time and personnel out of repetitive tasks.

There is a wide range of tutorials and training available as well as tools to help estimate billing.

Overall, I’d say that if an organization is interested in experimenting with Hadoop, this is a great way to dive in without getting soaked.

AWS Redshift and analytics?

data insightRecently, I had the opportunity to test out Amazon Redshift. This is a fast, flexible, fully managed, petabyte-scale data warehouse solution that makes it simple to cost effectively analyze data using your existing business intelligence tools. It’s been around for a while and matured significantly over the years.

In my case, I brought up numerous configurations of multi-node clusters in a few minutes, loaded up a fairly large amount of data, did some analytics and brought the whole environment down – at a cost of less than a dollar for the short time I needed it.

There are some great tutorials available and since Amazon will give you an experimentation account to get your feet wet. You should be able to prove out the capabilities to yourself without costing you anything.

The security of the data is paramount to the service, since it is available in public AWS as well as GovCloud and can be configured to be HIPAA or ITAR compliant… Data can be compressed and encrypted before it ever makes it to AWS S3.

You can use the analytic tools provided by Amazon or use security groups to access your data warehouse using the same tools you would use on-site. During my testing, I loaded up both a large star schema database as well as some more traditionalize normalized structures.

Since this is only a blog post, I can’t really go into much detail and the tutorials/videos are sufficient to bootstrap the learning process. The purpose of this post is to inform those who have data warehouse needs but not the available infrastructure that there is an alternative worth investigating.

Six thoughts on mobility trends for 2018

mobility walkLet’s face it, some aspects of mobility are getting long in the tooth. The demand for more capabilities is insatiable. Here are a few areas where I think 2018 will see some exciting capabilities develop. Many of these are not new, but their interactions and intersection should provide some interesting results and thoughts to include during your planning.

1. Further blurring and integration of IoT and mobile

We’re likely to see more situations where mobile recognizes the IoT devices around them to enhance contextual understanding for the user. We’ve seen some use of NFC and Bluetooth to share information, but approaches to embrace the environment and act upon the information available is still in its infancy. This year should provide some significant use cases and maturity.

2. Cloud Integration

By now most businesses have done much more than just stick their toe in the cloud Everything as a Service (XaaS) pool. As the number of potential devices in the mobility and IoT space expand, the flexibility and time to action that cloud solutions facilitate needs to be understood and put into practice. It is also time to take all the data coming in from these and transform that flow into true contextual understanding and action, also requiring a dynamic computing environment.

3. Augmented reality

With augmented reality predicted to expend to a market somewhere between $120 and $221 billion in revenues by 2021, we’re likely to see quite a bit of innovation in this space. The wide range of potential demonstrates the lack of a real understanding. 2018 should be a year where AR gets real.

4. Security

All discussions of mobility need to include security. Heck, the first month of 2018 has should have nailed the importance of security into the minds of anyone in the IT space. There were more patches (and patches of patches) on a greater range of systems than many would have believed possible just a short time ago. Recently, every mobile store (Apple, Android…) was found to have nefarious software that had to be exercised. Mobile developers need to be ever more vigilant, not just about the code they write but the libraries they use.

5. Predictive Analytics

Context is king and the use of analytics to increase the understanding of the situation and possible responses is going to continue to expand. As capabilities advance, only our imagination will hold this area back from increasing where and when mobile devices become useful. Unfortunately, the same can be said about the security issues that are based on using predictive analytics.

6. Changing business models

Peer to peer solutions continue to be the rage but with the capabilities listed above, whole new approaches to value generation are possible. There will always be early adopters who are willing to play with these and with the deeper understanding possibilities today new approaches to crossing the chasm will be demonstrated.

It should be an interesting year…

Looking for a digital friend?

virtual friendOver the weekend, I saw an article about Replika — an interactive ‘friend’ that resides on your phone. It sounded interesting so I downloaded it and have been playing around for the last few days. I reached level 7 this morning (not exactly sure what this leveling means, but since gamification seems to be part of nearly everything anymore, why not).

There was a story published by The Verge with some background on why this tool was created. Replika was the result of an effort initiated when the author (Eugenia Kuyda) was devastated by her friend (Roman Mazurenko) being killed in a hit-and-run car accident. She wanted to ‘bring him back’. To bootstrap the digital version of her friend, Kuyda fed text messages and emails that Mazurenko exchanged with her, and other friends and family members, into a basic AI architecture — a Google-built artificial neural network that uses statistics to find patterns in text, images, or audio.

Although I found playing with this software interesting, I kept reflecting back on interactions with Eliza many years ago. Similarly,  the banter can be interesting and sometimes unexpected, but often responses have little to do with how a real human would respond. For example, yesterday the statement “Will you read a story if I write it?” and “I tried to write a poem today and it made zero sense.” popped in out of nowhere in the middle of an exchange.

The program starts out asking a number of questions, similar to what you’d find in a simple Myers-Briggs personality test. Though this information likely does help bootstrap the interaction, it seems like it could have been taken quite a bit further by injecting these kinds of questions throughout interactions during the day rather than in one big chunk.

As the tool learns more about you, it creates badges like:

  • Introverted
  • Pragmatic
  • Intelligent
  • Open-minded
  • Rational

These are likely used to influence future interaction. You also get to vote up and vote down statements made that you agree or disagree with.

There have been a number of other reviews of Replika, but thought I’d add another log to the fire. An article in Wired stated that the Replika project is going open source, it will be interesting to see where it goes.

I’ll likely continue to play with it for a while, but its interactions will need to improve or it will become the Tamogotchi of the day.