Let's Ask All Our Embarrassing Data Questions/Review

02 November 2016

What does scraping actually mean? Why would I need databases and what the heck is MongoDB? What’s the difference between SQL and SQLite? And can I truncate my y-axis or will the data vis community kill me? Even if we’ve worked with data for years, we might have simple questions like that. And now it’s too late to ask. Waaaay too late. Nobody can ever know that you still don’t know what Pandas is. Or why people use PostgreSQL.

Last weekend, hundreds of people came together to celebrate the Mozilla Festival in London, “the world’s leading festival for the open internet movement.” My fellow OpenNews fellows and I attended – that’s me and two of them in Hyde Park:

I decided to host a session that would tackle all the knowledge gaps in the sphere of data. The idea: During one hour, all attendees anonymously submit data questions (which we gathered in an Etherpad). Then everybody reads one question and tries to answer it. If she or he needs help, the whole group helps out.

Why I wanted such a session to happen? Because self-taught as many of us are, we have gaps in our knowledge and skills that other people can fill. And we all can fill other people’s knowledge gaps.

The whole session was one big experiment. I’ve never hosted such a question round before. Here is a write-up of what went well, and what could be improved the next time. Let’s start with what didn’t go well, and how to improve it:

What didn’t go well

My aim was to close knowledge gaps. That definitely happened. But it happened far more for beginners than for advanced people. During the session, we answered questions that were on hugely different levels. Questions like “What does scraping mean?” as well as “How can I more reliably bin data for choropleth maps?”. The advanced people were the ones answering the basic questions to beginners AND the advanced questions to other advanced people. It might be a good idea to divide beginners and advanced people in different groups.

If I was hosting such a session again, I’d try to get more un-googlable questions. Google got you covered when you ask “What is an API?”, but not if you want to know if “your audience really interacts with your interactive dataviz”. Answering the latter question challenges the advanced people more than answering the former one. Also, un-googlable questions require opinions more than explanations. That’s good, because explaining things is hard. People think for hours how to explain an API best – it’s unlikely that my session attendees and I can come up with perfect explanations in seconds. The result were sometimes confusing answers (which could demotivate to learn more about it), and not enough time to go in-depth and ask clarifying questions.

Third improvement: Better defining the question space. “Data” is an extremely big field: It includes Business Intelligence and stats, maps and interactivity. I appreciate about the field of data visualization that every new project challenges us to learn about new tools, methods, workflows. And most answers made me learn something relevant. But a dashboard person won’t gain anything out of hearing an answer to the question: “Are there any good/clear resources for better understanding of “overpass turbo” queries in OSM?” It would be helpful to at least define the industry for such a session: Will it be about data questions in journalism? Or in science, or in business?

What went well

I was positively surprised especially about how open people were with their questions and with what they didn’t know. I tried to create a safe space by
a) talking about how we are all students and teachers at the same time and that nobody can know anything
b) doing a fun short exercise in the beginning where people needed to interact with each other
c) splitting the attendees in two groups to create more intimacy
d) letting people submit questions anonymously
e) stressing it a lot when I didn’t know something.

And it was great to see how quickly people came up with questions. I prepared some questions beforehand, but we didn’t even get to them.

I was also happy with the engagement of the attendees. I hope that everybody got something out of it: The beginners got answers, and the advanced people got the feeling of helping somebody; of explaining things that are relevant for at least one person in the room.

In general, it was definitely an experiment in which I’ve learned a lot. The concept needs improvement, but I look forward to trying it out the next time.

After splitting the attendees into two groups, the great Simon Jockers lead the other group and gave valuable feedback afterwards. Thanks, Simon! And the photo and GIF are by the great Drew Wilson. Thanks, Drew! All questions can be found here.