Sunday, 21 September 2014

These are a few of my favourite (audience research) things

On Friday I popped into London to give a talk at the Art of Digital meetup at the Photographer's Gallery. It's a great series of events organised by Caroline Heron and Jo Healy, so go along sometime if you can. I talked about different ways of doing audience research. (And when I wrote the line 'getting to know you' it gave me an earworm and a 'lessons from musicals' theme). It was a talk of two halves. In the first, I outlined different ways of thinking about audience research, then went into a little more detail about a few of my favourite (audience research) things.

There are lots of different ways to understand the contexts and needs different audiences bring to your offerings. You probably also want to test to see if what you're making works for them and to get a sense of what they're currently doing with your websites, apps or venues. It can help to think of research methods along scales of time, distance, numbers, 'density' and intimacy. (Or you could think of it as a journey from 'somewhere out there' to 'dancing cheek to cheek'...)

'Time' refers to both how much time a method asks from the audience and how much time it takes to analyse the results. There's no getting around the fact that nearly all methods require time to plan, prepare and pilot, sorry! You can run 5 second tests that ask remote visitors a single question, or spend months embedded in a workplace shadowing people (and more time afterwards analysing the results). On the distance scale, you can work with remote testers located anywhere across the world, ask people visiting your museum to look at a few prototype screens, or physically locate yourself in someone's office for an interview or observation.

Numbers and 'density' (or the richness of communication and the resulting data) tend to be inversely linked. Analytics or log files let you gather data from millions of website or app users, one-question surveys can garner thousands of responses, you can interview dozens of people or test prototypes with 5-8 users each time. However, the conversations you'll have in a semi-structured interview are much richer than the responses you'll get to a multiple-choice questionnaire. This is partly because it's a two-way dialogue, and partly because in-person interviews convey more information, including tone of voice, physical gestures, impressions of a location and possibly even physical artefacts or demonstrations. Generally, methods that can reach millions of remote people produce lots of point data, while more intimate methods that involve spending lots of time with just a few people produce small datasets of really rich data.

So here are few of my favourite things: analytics, one-question surveys, 5 second tests, lightweight usability tests, semi-structured interviews, and on-site observations. Ultimately, the methods you use are a balance of time and distance, the richness of the data required, and whether you want to understand the requirements for, or measure the performance of a site or tool.

Analytics are great for understanding how people found you, what they're doing on your site, and how this changes over time. Analytics can help you work out which bits of a website need tweaking, and for measuring to see the impact of changes. But that only gets you so far - how do you know which trends are meaningful and which are just noise? To understand why people are doing what they do, you need other forms of research to flesh them out. 

One question surveys are a great way of finding out why people are on your site, and whether they've succeeded in achieving their goals for being there. We linked survey answers to analytics for the last Let's Get Real project so we could see how people who were there for different reasons behaved on the site, but you don't need to go that far - any information about why people are on your site is better than none! 

5 second tests and lightweight usability tests are both ways to find out how well a design works for its intended audiences. 5 second tests show people an interface for 5 seconds, then ask them what they remember about it, or where they'd click to do a particular task. They're a good way to make sure your text and design are clear. Usability tests take from a few minutes to an hour, and are usually done in person. One of my favourite lightweight tests involves grabbing a sketch, an iPad or laptop and asking people in a cafĂ© or other space if they'd help by testing a site for a few minutes. You can gather lots of feedback really quickly, and report back with a prioritised list of fixes by the end of the day. 

Semi-structured interviews use the same set of questions each time to ensure some consistency between interviews, but they're flexible enough to let you delve into detail and follow any interesting diversions that arise during the conversation. Interviews and observations can be even more informative if they're done in the space where the activities you're interested in take place. 'Contextual inquiry' goes a step further by including observations of the tasks you're interested in being performed. If you can 'apprentice' yourself to someone, it's a great way to have them explain to you why things are done the way they are. However, it's obviously a lot more difficult to find someone willing and able to let you observe them in this way, it's not appropriate for every task or research question, and the data that results can be so rich and dense with information that it takes a long time to review and analyse. 

And one final titbit of wisdom from a musical - always look on the bright side of life! Any knowledge is better than none, so if you manage to get any audience research or usability testing done then you're already better off than you were before.

Wednesday, 10 September 2014

Does citizen science invite sabotage?

Q: Does citizen science invite sabotage?

A: No.

Ok, you may want a longer version. There's a paper on crowdsourcing competitions that has lost some important context in doing the rounds of media outlets. For example, on Australia's ABC, 'Citizen science invites sabotage':
'a study published in the Journal of the Royal Society Interface is urging caution at this time of unprecedented reliance on citizen science. It's found crowdsourced research is vulnerable to sabotage. [...] MANUEL CEBRIAN: Money doesn't really matter, what matters is that you can actually get something - whether that's recognition, whether that's getting a contract, whether that's actually positioning an idea, for instance in the pro and anti-climate change debate - whenever you can actually get ahead.'.
The fact that the research is studying crowdsourcing competitions, which are fundamentally different to other forms of crowdsourcing that do not have a 'winner takes all' dynamic, is not mentioned. It also does not mention the years of practical and theoretical work on task validation which makes it quite difficult for someone to get enough data past various controls to significantly alter the results of crowdsourced or citizen science projects.

You can read the full paper for free, but even the title, Crowdsourcing contest dilemma, and the abstract makes the very specific scope of their study clear:
Crowdsourcing offers unprecedented potential for solving tasks efficiently by tapping into the skills of large groups of people. A salient feature of crowdsourcing—its openness of entry—makes it vulnerable to malicious behaviour. Such behaviour took place in a number of recent popular crowdsourcing competitions. We provide game-theoretic analysis of a fundamental trade-off between the potential for increased productivity and the possibility of being set back by malicious behaviour. Our results show that in crowdsourcing competitions malicious behaviour is the norm, not the anomaly—a result contrary to the conventional wisdom in the area. Counterintuitively, making the attacks more costly does not deter them but leads to a less desirable outcome. These findings have cautionary implications for the design of crowdsourcing competitions.
And from the paper itself:

'We study a non-cooperative situation where two players (or firms) compete to obtain a better solution to a given task. [...] The salient feature is that there is only one winner in the competition. [...] In scenarios of ‘competitive’ crowdsourcing, where there is an inherent desire to hurt the opponent, attacks on crowdsourcing strategies are essentially unavoidable.'
From Crowdsourcing contest dilemma by Victor Naroditskiy, Nicholas R. Jennings, Pascal Van Hentenryck and Manuel Cebrian. Published 20 August 2014 doi: 10.1098/​rsif.2014.0532 J. R. Soc. Interface 6 October 2014 vol. 11 no. 99 20140532
I don't know about you, but 'an inherent desire to hurt the opponent' doesn't sound like the kinds of cooperative crowdsourcing projects we tend to see in citizen science or cultural heritage crowdsourcing.   The study is interesting, but it is not generalisable to 'crowdsourcing' as a whole.

If you're interested in crowdsourcing competitions, you may also be interested in: On the trickiness of crowdsourcing competitions: some lessons from Sydney Design from May 2013. 

Tuesday, 9 September 2014

Helping us fly? Machine learning and crowdsourcing

Moon Machine by Bernard Brussel-Smith via Serendip-o-matic
Over the past few years we've seen an increasing number of projects that take the phrase 'human-computer interaction' literally (or perhaps turning HCI into human-computer integration), organising tasks done by people and by computers into a unified system. One of the most obvious benefits of crowdsourcing on digital platforms has been the ability to coordinate the distribution and validation of tasks, but now data classified by people through crowdsourcing is being fed into computers to improve machine learning so that computers can learn to recognise images almost as well as we do. I've outlined a few projects putting this approach to work below. Of course, this creates new challenges for the future - what do cultural heritage crowdsourcing projects do when all the fun tasks like image tagging and text transcription can be done by computers? After all, Fast Company reports 'at least one Zooniverse project, Galaxy Zoo Supernova, has already automated itself out of existence'. More positively, assuming we can find compelling reasons for people to spend time with cultural heritage collections, how does machine learning and task coordination free us to fly further?

The Public Catalogue Foundation has taken tags created through Your Paintings Tagger and turned them over to computers. As they explain, the results are impressive. The art of computer image recognition: 'Using the 3.5 million or so tags provided by taggers, the research team at Oxford 'educated' image-recognition software to recognise the top tagged terms. Professor Zisserman explains this is a three stage process. Firstly, gather all paintings tagged by taggers with a particular subject (e.g. ‘horse’). Secondly, use feature extraction processes to build an ‘object model’ of a horse (a set of characteristics a painting might have that would indicate that a horse is present). Thirdly, run this algorithm over the Your Paintings database and rank paintings according to how closely they match this model.'

The BBC World Service archive ‘used an open-source speech recognition toolkit to listen to every programme and convert it to text’, extracted keywords or tags from the transcripts then got people to check the correctness of the data created: ‘As well as listening to programmes in the archive, users can view the automatic tags and vote on whether they’re correct or incorrect or add completely new tags. They can also edit programme titles and synopses, select appropriate images and name the voices heard’. From Algorithms and Crowd-Sourcing for Digital Archives by Tristan Ferne. See also What we learnt by crowdsourcing the World Service archive by Yves Raimond, Michael Smethurst, Tristan Ferne on 15 September 2014: 'we believe we have shown that a combination of automated tagging algorithms and crowdsourcing can be used to publish a large archive like this quickly and efficiently'.

And of course the Zooniverse is working on this. From their Milky Way project blog, New MWP paper outlines the powerful synergy between citizens scientists, professional scientists, and machine learning: '...a wonderful synergy that can exist between citizen scientists, professional scientists, and machine learning. The example outlined with the Milky Way Project is that citizens can identify patterns that machines cannot detect without training, machine learning algorithms can use citizen science projects as input training sets, creating amazing new opportunities to speed-up the pace of discovery. A hybrid model of machine learning combined with crowdsourced training data from citizen scientists can not only classify large quantities of data, but also address the weakness of each approach if deployed alone.'

If you're interested in the theory, an early discussion of human input into machine learning is in Quinn and Bederson's 2011 Human Computation: A Survey and Taxonomy of a Growing Field. More recently, the SOCIAM: The Theory and Practice of Social Machines project is looking at 'a new kind of emergent, collective problem solving, in which we see (i) problems solved by very large scale human participation via the Web, (ii) access to, or the ability to generate, large amounts of relevant data using open data standards, (iii) confidence in the quality of the data and (iv) intuitive interfaces', including 'citizen science social machines'. If you're really keen, you can get a sense of the state of the field from various conference papers, including ICML ’13 Workshop: Machine Learning Meets Crowdsourcing and ICML ’14 Workshop: Crowdsourcing and Human Computing. There's also a mega-list of academic crowdsourcing conferences and workshops, though it doesn't include much on the tiny corner of the world that is crowdsourcing in cultural heritage.

NB: this post is a bit of a marker so I've somewhere to put thoughts on machine learning and human-computer integration as I finish my thesis; I'll update this post as I collect more references. Do you know of examples I've missed, or implications we should consider? Comment here or on twitter to start the conversation...