What I learned during the IC2S2

11 minute read

Published:

This year’s IC2S2 was held in the mostly sunny Amsterdam. Being my first time at a Computational Social Science conference, I decided to write down my thoughts and a tweet-length gist for most of the talks I attended.

The 5th edition of the conference covered a wide set of topics. Text and network analysis, complex systems, game theory, experiments, machine learning, qualitative studies, mixed-method approaches, collective intelligence, polarisation, misinformation, protests, migration, inequality, cooperation and organisation, science of science, reproducibility and transparency; throughout the days, IC2S2 offered something interesting to every community. I enjoyed that most of the talks were accessible, however, I would prefer them a bit longer. Presenters would have time to dive into some of the details as well as address more questions.

I also noticed a few recurrent, strongly opinionated topics during the conference:

  • The split between Social Science and Computer Science and how to bridge the gap between communities.
  • An increasing scarcity of data suitable for research.
  • The future of Computational Social Science: The importance of theory-driven approaches, a necessary turn to solution-oriented research and the skills that are required to excel in the field.

Overall, it was a great experience and I am looking forward to IC2S2 2020 in Boston!

Without further ado, here’s a summary of the IC2S2:

Keynotes

Kenneth Benoit on NLP and social scientific analysis.

Kenneth highlighted that “we use language for communication, not to create data” and that this has some obvious problems for social science research which we can fix with “better models and less denial”.

He also made a highly contested argument that there’s a split between Computer Scientists and Social Scientists with the former group caring more about prediction and “optimising a loss function”, while the latter group focusing on explanation and measurement.

“Sometimes what improves prediction kills measurement.”

stacked

Kenneth also mentioned how the insidious bias can be harmful and that we should treat unsupervised methods with caution. Lastly, he stressed out the importance of reproducibility and transparency in computational social science.

Deen Freelon discussing the data landscape

Deen talked about the post-API age where the proprietary databases are data oases, paywalled and in inconvenient formats but still available for research. He showcased two of his Python packages for extracting and preprocessing data from NexusUni and Factiva. Personal take: There is a plethora of under-utilised open data sources and more are becoming available regularly, however, their share in the data landscape is decreasing over time.

He also touched on research principles; a researcher might be compliant with the Terms & Conditions of a platform, however, it’s the ethical use of data that we should focus on.

Monica Lee & Devra Moehler from Facebook

They gave a fantastic talk on Facebook’s approaches to tackle misinformation and prevent election interference. They acknowledged that in 2016, Facebook failed to prevent numerous fake accounts from posting messages on its pages and groups which amplified misinformative news. These posts were receiving approximately 16M views per week in 2016.

stacked

They also described how malicious actors organised information operations by infiltrating Facebook communities, such as the LGBTQ+, establishing relations with the organisers and other members and then spreading misinformative opinions.

stacked

Facebook has deployed a variety of prevention tactics described as “Inform, Reduce, Remove”. I’ve binned them into three rough categories:

  • Better models and software: They explained how they used network analysis and vectorisation methods to detect clusters of malicious actors that promote and amplify misinformative messages.
  • New policies: For political advertisements, users must provide their identity and location. This worked great in the US in 2018, however, it’s important to adapt it to different countries. For example, citizens and advertisers of other countries, such as India, couldn’t understand why this additional verification step was needed because the concepts of election interference and misinformation were either very new or non-existent.
  • Raising awareness and informing users about the elections: Enrich users’ knowledge of their local electorate. Fascinating work combining UX research, surveys, interviews and AB testing to understand what works in different countries.

Bonus links:

Keynote by Claudia Wagner

Fascinating talk on perception bias, networks, representation of minorities, using a survey-inspired error framework for studying open data, theory-guided data collection processes and justifying and documenting design choices.

stacked

Panel discussion on the future of CSS

We moved from a data-poor to a data-rich world, however, the gap between company-owned data and data available for research is getting wider.

Generally, we may be heading towards solution-oriented social science research. Personal take: That’s great. We should bridge the gap between theory, academic research and real-world applications.

Doing Computational Social Science well is tough. It requires a variety of skills (Rense Corten summarised some of them on Twitter) and the solution isn’t to create a new discipline but develop interfaces that would enable the efficient collaboration of interdisciplinary teams.

Lastly, Helen Margetts highlighted the importance of data mapping exercises to help policymakers understand the information they possess. We’ve written a summary of how we do this at Nesta.

Keynote by Cesar Hidalgo

What a way to close the conference. Fascinating talk by Cesar on how humans judge machines. Cesar and his team conducted a series of experiments documenting the differences on how people judge human and AI actions in a variety of scenarios.

There’s a book coming up.

stacked

Misinformation track

Vaccine misinformation on Twitter

Jieyu Ding gathered tweets using a keyword-based approach and used four human annotators to label them as informative or misinformative. She then examined the misinformation and non-misinformation semantic networks to spot differences in terms used and find the prevalent topics for each of them.

I liked her workflow chart.

stacked

Social media, fake news and perception of truth

Interesting talk by Abhijnan Chakraborty on prioritisation in fact-checking, how users can be biased depending on the story and how to correct the misperception of truth with new analytical frameworks.

Science studies track

Detecting innovation through combination

George Richardson’s talk was a great example of how continuous experimentation and research leads to tools that work in practice. Using his novel methods of identifying emerging topics, he showcased how machine learning became mainstream.

stacked

Cooperation track

Learning network structure in quadratic games

Very interesting talk by Yan Leng on using network-based interactions and setting up games to infer the agents’ behaviour/relationship with other agents.

Vote trading

We only have anecdotal evidence on vote trading and very poor ways of measuring it. Daniele presented a framework for achieving the latter. He tested it with UNGA data. He used topic modelling on UN resolutions to create features for a model predicting voting behaviour.

stacked

Policy track

Mass media discourse on internet regulation in Russia

The researchers studied the evolution and thematic topics of that issue. What I liked: Clustering semantic networks for dynamic topic modelling

stacked

Illegal markets and anonymity

Fascinating talk by Isak Ladegaard on what happened with Silk Road’s sellers after the closure of the platform by the FBI in 2013. He highlighted that legal pressure doesn’t impair the development of illegal markets in the digital age. They are highly adaptable and resemble pre-modern markets. He also described how sellers adopted new identity verification methods that enabled them to migrate to other markets without losing their ranking and fame.

stacked

Text analysis track

Document composition, bias and word embeddings

Awesome work on detecting and measuring bias in documents. The researchers used recent interpretability methods to explore the composition of the documents which most affect bias in word embeddings.

Also learned about this NYT project.

Comparing semantic change detection approaches with word embeddings

Dong Nguyen and her colleagues proposed a new evaluation framework and systematically compared the different choices involved when used word embeddings for semantic change detection. She also showcased her results on a large chunk of Twitter data.

stacked

Collective Attention track

The acceleration of collection attention

Very interesting talk by Philipp Lorenz-Spreen measuring the trajectories of collective attention received by Twitter hashtags, books or movies. They revealed an acceleration in the content creation across several decades and that this information overload rapidly consumes our collective attention.

stacked

A New Dataset for evaluating the quality of online news

Rebekah Tromble presented a new, large dataset to develop automatic classifiers for signals of news quality and misinformation in US media reporting. That’s part of the (mis)informed Citizen project from the Alan Turing Institute.

An excellent example of how social science researchers can effectively collaborate with engineering teams.

Estimating the effect of media attention on terror organisations

The researchers studied whether the media attention to terrorist attacks fuels the growth of the organisations behind them.

They collected attack-level media attention data and estimated its effect with an instrumental variable analysis using the co-occurrence of natural disasters with the attacks as an instrumental variable.

They found that terrorist organisations manipulate media attention for their own benefit (mostly to get funding) and that only smaller organisations are affected when their attacks are deprived of publicity.

Organisations

Optimal team construction for a complex task

The researchers ran an experiment to examine how various factors such as skill diversity, cognitive style, social perceptiveness and skill-level, affect a team’s collective performance. They found that social perceptiveness is positively related to performance, however, skill-level is more important, accounting for almost four times as much the explained variance. They also found that skill diversity is negatively associated with team performance.

stacked

Impact of Architectural Space Typologies and Human Interaction Patterns on the Performance of Startup Teams operating in Entrepreneurship Incubator Centres

Fascinating work by researchers from the MIT and Harvard showing their analysis of how architectural features and behavioural working patterns impact performance outcome and success. They ran an experiment on during the MIT DeltaV programme at the MIT Martin Trust Center and their work (ongoing) will inform the future architecture and organisation design of research-based entrepreneurship co-working centres.

stacked

Wisdom of Crowd

Social Influence Undermines the Wisdom of Crowd in Sequential Decision-Making

The researchers ran an online experiment using Amazon Mechanical Turk to examine how social influence affects collective decision making. Interestingly, they found that organisation seeking to exploit crowd wisdom in team decision-making should reward independent expression as well as avoid rewarding individuals who have made the right call.

Social Network Centralisation and Collective Intelligence: a randomised experiment

Researchers showed that social network centralisation helps collective intelligence by testing various network structures (from star networks to hub-and-spoke) and how information flows in them.

stacked

Closing thoughts

IC2S2 provided a great platform to meet interesting people, be inspired by their work, share your contributions to the field and receive valuable feedback.

Lastly, the organisers deserve whole bowls of cookies. They did a spectacular job, handling a crowd of 480 and entertaining it with loads of keynotes spread throughout the day, 6 parallel sessions with fascinating talks, a nice reception, boat ride and dinner at a wonderful place.