Yes, I know I haven’t blogged in a while, I’ve been heads-down at the office but I did want to write a few notes from my panel 2 weeks ago at the Open Mobile Summit (with Holger Luedorf of FourSquare, Mark Johnson of Zite and Sharon Wienbar of Scale Venture Partners) on context-aware and use of sensors and signals to enhance application experiences.
We are certainly entering the age of the “smart or learning” application. Use of AI, semantic technologies, learning, sensors and other signals to deliver personalized user experiences is happening – why should your experience in Application XYX be the same as me? It shouldn’t; each of our user experiences should be different and personalized to our likes, interests, needs and so forth. Big data and the cloud has enabled affordable data crunching to deliver personalized experiences but there are certainly some things to think about as we had discussed on the panel.
1. Context is not a category. I often analog this to 04/05 when LBS was a category but in reality, what app doesn’t benefit from knowing its location? LBS is no longer a category but it’s a layer across all apps. At present, context-aware is being treated as a category (ie you are in the “Assistant” category) but this is a short-sighted view. Context will become a layer across all services and maybe even yielding the definition of Web 3.0 with the semantic web?
2. Personalization and context can happen at the onset (initial experience with an app) but the real magic begins once you have collected enough data to mine and analyze. This is the chicken and egg problem with context-aware. The benefits may often not shine until you’ve collected that critical threshold of data to start making useful suggestions or augmenting the experience in useful ways. As you probably all have experienced, a recommendation system never works well at the beginning but as you tune the machine, the system gets better but unfortunately most users give-up well before that.
One approach is to seed some learning initially by asking setup questions. For example, both Zite or even Sosh ask at the onset, what are your interests and seed some initial suggestions and learning to provide enough of a hook for you to continue to use their applications so they can collect more data and improve. Leveraging and expecting context to be perfect at the onset is very hard and in someways this is what made the Saga virtual personal assistant so hard to digest initially because the suggestions were not relevant or too generic but they have since rectified this by tuning how many explicit questions to ask the user at the onset and in your general usage.
3. There are two types of learning: implicit and explicit learning. In almost every instance, explicit learning will trump implicit learning and thus why the FourSquare explicit check-in is more powerful signal for restaurant recommendations then the implicit track where everyone goes Alohar Mobile enabled approach. Implicit learning requires significant data but more importantly significant correlated data to make sure that your results are not anomalies or skewed. A great example of this is you telling Apple Maps that a particular address is wrong vs Apple Maps attempting to deduce that by looking at tons of traffic patterns and movement after the destination and so forth. Unfortunately, very few applications if any have really succeeded in collecting explicit feedback especially where the user perception is that they are having to train the system; users want immediate delight! Pandora and Netflix are obvious exception and in someways, the simplicity of their feedback system with thumbs-up and thumbs-down absolutely helps.
That aside, with enough implicit data, you can absolutely make smart suggestions but how much data is required depends on your use case. There is learning on a user but there is learning on the masses and some services very much benefit from the aggregate data. In addition, for implicit systems to work, you need to have some understanding of your personas (cohorts). For example, my user base hypothetically has 5 types of users and these are the patterns or signals for each type. Having too few or too many cohorts can have adverse effects and so determining what your clusters are is an essential part of making implicit learning work.
4. Location is only 1 sensor of many. I find it interesting that in many of my conversations when talking about context-aware, the assumption is immediately always-on location. Yes, location is 1 signal and it may be a stronger or weaker signal depending on use case (eg for Zite with news, location is certainly not as important as time but for Foursquare, it’s all about location). But in addition to signals like location and time, there are many other signals you could leverage. There may be back-end data sources that you use to drive intent for your application. For example, Tweets and Facebook posts could signal likes and dislikes of particular brands and this is how some ad networks like intent-targeting ad networks like LocalResponse leverage this information. Alternatively, there are other device signals that you could leverage like whether the user is driving or in an airplane or maybe whether it’s sunny or rainy outside. There are literally 100s of possible signals that can play into your service depending on the use case.
Notably, even with the 100+ signals and this is up for debate, I have generally found that 2 or 3 signals will be 80% of the “answer.” For example, Netflix may use significant statistical data to figure out your interests based on what you’ve watched previously and so forth but your explicit likes may be 80% of the signal.
5. One of the hardest things with all learning applications is QA and regression. The problem with recommendations and suggestions is that there is often no right or wrong answer. Other signals like product statistics have to be analyzed to see if your tweak in the algorithm yielded a positive result. A great example of this is Google search – how do you design a QA system to determine that the search results are getting better and how do you know that a tweak in your algorithm doesn’t break something in the long-tail that you may have forgotten to QA (caused regression).
There are different approaches to this problem but all require some combination of options. One approach is to use an army of mechanical turkers or others to evaluate the results on a random sampling of long-tail data. This has data privacy issues if not done right and does require ample setup to make work. Other approaches include explicitly asking the user and/or looking at product statistics to see if things have trended in the right way. This of-course requires a thorough understanding of your personas/cohorts and ample tools to glean the statistics you need to drive a conclusion. Another approach involves pre-annotating 100s of cases with their expected results and then building 100s of unit tests to make sure that all of those tests continue to pass with each improvement to the algorithm. The problem here is you never know if you have enough tests across the distribution and it can take serious time to build enough of that corpus to really start seeing the benefit.
I can probably keep going with more thoughts on how to bring context into your application but I thought the above would serve as a quick primer on the topic and a short summary of some of what we discussed on our panel.
Feel free to ping me offline if you want to discuss further!