This year’s IC2S2 was held in the mostly sunny Amsterdam. Being my first time at a Computational Social Science conference, I decided to write down my thoughts and a tweet-length gist for most of the talks I attended.
The 5th edition of the conference covered a wide set of topics. Text and network analysis, complex systems, game theory, experiments, machine learning, qualitative studies, mixed-method approaches, collective intelligence, polarisation, misinformation, protests, migration, inequality, cooperation and organisation, science of science, reproducibility and transparency; throughout the days, IC2S2 offered something interesting to every community. I enjoyed that most of the talks were accessible, however, I would prefer them a bit longer. Presenters would have time to dive into some of the details as well as address more questions.
I also noticed a few recurrent, strongly opinionated topics during the conference:
- The split between Social Science and Computer Science and how to bridge the gap between communities.
- An increasing scarcity of data suitable for research.
- The future of Computational Social Science: The importance of theory-driven approaches, a necessary turn to solution-oriented research and the skills that are required to excel in the field.
Overall, it was a great experience and I am looking forward to IC2S2 2020 in Boston!
Without further ado, here’s a summary of the IC2S2:
Kenneth Benoit on NLP and social scientific analysis.
Kenneth highlighted that “we use language for communication, not to create data” and that this has some obvious problems for social science research which we can fix with “better models and less denial”.
He also made a highly contested argument that there’s a split between Computer Scientists and Social Scientists with the former group caring more about prediction and “optimising a loss function”, while the latter group focusing on explanation and measurement.
“Sometimes what improves prediction kills measurement.”
Kenneth also mentioned how the insidious bias can be harmful and that we should treat unsupervised methods with caution. Lastly, he stressed out the importance of reproducibility and transparency in computational social science.
Deen Freelon discussing the data landscape
Deen talked about the post-API age where the proprietary databases are data oases, paywalled and in inconvenient formats but still available for research. He showcased two of his Python packages for extracting and preprocessing data from NexusUni and Factiva. Personal take: There is a plethora of under-utilised open data sources and more are becoming available regularly, however, their share in the data landscape is decreasing over time.
He also touched on research principles; a researcher might be compliant with the Terms & Conditions of a platform, however, it’s the ethical use of data that we should focus on.
Monica Lee & Devra Moehler from Facebook
They gave a fantastic talk on Facebook’s approaches to tackle misinformation and prevent election interference. They acknowledged that in 2016, Facebook failed to prevent numerous fake accounts from posting messages on its pages and groups which amplified misinformative news. These posts were receiving approximately 16M views per week in 2016.
They also described how malicious actors organised information operations by infiltrating Facebook communities, such as the LGBTQ+, establishing relations with the organisers and other members and then spreading misinformative opinions.
Facebook has deployed a variety of prevention tactics described as “Inform, Reduce, Remove”. I’ve binned them into three rough categories:
- Better models and software: They explained how they used network analysis and vectorisation methods to detect clusters of malicious actors that promote and amplify misinformative messages.
- New policies: For political advertisements, users must provide their identity and location. This worked great in the US in 2018, however, it’s important to adapt it to different countries. For example, citizens and advertisers of other countries, such as India, couldn’t understand why this additional verification step was needed because the concepts of election interference and misinformation were either very new or non-existent.
- Raising awareness and informing users about the elections: Enrich users’ knowledge of their local electorate. Fascinating work combining UX research, surveys, interviews and AB testing to understand what works in different countries.
Keynote by Claudia Wagner
Fascinating talk on perception bias, networks, representation of minorities, using a survey-inspired error framework for studying open data, theory-guided data collection processes and justifying and documenting design choices.
Panel discussion on the future of CSS
We moved from a data-poor to a data-rich world, however, the gap between company-owned data and data available for research is getting wider.
Generally, we may be heading towards solution-oriented social science research. Personal take: That’s great. We should bridge the gap between theory, academic research and real-world applications.
Doing Computational Social Science well is tough. It requires a variety of skills (Rense Corten summarised some of them on Twitter) and the solution isn’t to create a new discipline but develop interfaces that would enable the efficient collaboration of interdisciplinary teams.
Lastly, Helen Margetts highlighted the importance of data mapping exercises to help policymakers understand the information they possess. We’ve written a summary of how we do this at Nesta.
Keynote by Cesar Hidalgo
What a way to close the conference. Fascinating talk by Cesar on how humans judge machines. Cesar and his team conducted a series of experiments documenting the differences on how people judge human and AI actions in a variety of scenarios.
There’s a book coming up.
Vaccine misinformation on Twitter
Jieyu Ding gathered tweets using a keyword-based approach and used four human annotators to label them as informative or misinformative. She then examined the misinformation and non-misinformation semantic networks to spot differences in terms used and find the prevalent topics for each of them.
I liked her workflow chart.
Social media, fake news and perception of truth
Interesting talk by Abhijnan Chakraborty on prioritisation in fact-checking, how users can be biased depending on the story and how to correct the misperception of truth with new analytical frameworks.
Science studies track
Detecting innovation through combination
George Richardson’s talk was a great example of how continuous experimentation and research leads to tools that work in practice. Using his novel methods of identifying emerging topics, he showcased how machine learning became mainstream.
Learning network structure in quadratic games
Very interesting talk by Yan Leng on using network-based interactions and setting up games to infer the agents’ behaviour/relationship with other agents.
We only have anecdotal evidence on vote trading and very poor ways of measuring it. Daniele presented a framework for achieving the latter. He tested it with UNGA data. He used topic modelling on UN resolutions to create features for a model predicting voting behaviour.
Mass media discourse on internet regulation in Russia
The researchers studied the evolution and thematic topics of that issue. What I liked: Clustering semantic networks for dynamic topic modelling
Illegal markets and anonymity
Fascinating talk by Isak Ladegaard on what happened with Silk Road’s sellers after the closure of the platform by the FBI in 2013. He highlighted that legal pressure doesn’t impair the development of illegal markets in the digital age. They are highly adaptable and resemble pre-modern markets. He also described how sellers adopted new identity verification methods that enabled them to migrate to other markets without losing their ranking and fame.
Text analysis track
Document composition, bias and word embeddings
Awesome work on detecting and measuring bias in documents. The researchers used recent interpretability methods to explore the composition of the documents which most affect bias in word embeddings.
Also learned about this NYT project.
Comparing semantic change detection approaches with word embeddings
Dong Nguyen and her colleagues proposed a new evaluation framework and systematically compared the different choices involved when used word embeddings for semantic change detection. She also showcased her results on a large chunk of Twitter data.
Collective Attention track
The acceleration of collection attention
Very interesting talk by Philipp Lorenz-Spreen measuring the trajectories of collective attention received by Twitter hashtags, books or movies. They revealed an acceleration in the content creation across several decades and that this information overload rapidly consumes our collective attention.
A New Dataset for evaluating the quality of online news
Rebekah Tromble presented a new, large dataset to develop automatic classifiers for signals of news quality and misinformation in US media reporting. That’s part of the (mis)informed Citizen project from the Alan Turing Institute.
An excellent example of how social science researchers can effectively collaborate with engineering teams.
Estimating the effect of media attention on terror organisations
The researchers studied whether the media attention to terrorist attacks fuels the growth of the organisations behind them.
They collected attack-level media attention data and estimated its effect with an instrumental variable analysis using the co-occurrence of natural disasters with the attacks as an instrumental variable.
They found that terrorist organisations manipulate media attention for their own benefit (mostly to get funding) and that only smaller organisations are affected when their attacks are deprived of publicity.
Optimal team construction for a complex task
The researchers ran an experiment to examine how various factors such as skill diversity, cognitive style, social perceptiveness and skill-level, affect a team’s collective performance. They found that social perceptiveness is positively related to performance, however, skill-level is more important, accounting for almost four times as much the explained variance. They also found that skill diversity is negatively associated with team performance.
Impact of Architectural Space Typologies and Human Interaction Patterns on the Performance of Startup Teams operating in Entrepreneurship Incubator Centres
Fascinating work by researchers from the MIT and Harvard showing their analysis of how architectural features and behavioural working patterns impact performance outcome and success. They ran an experiment on during the MIT DeltaV programme at the MIT Martin Trust Center and their work (ongoing) will inform the future architecture and organisation design of research-based entrepreneurship co-working centres.
Wisdom of Crowd
Social Influence Undermines the Wisdom of Crowd in Sequential Decision-Making
The researchers ran an online experiment using Amazon Mechanical Turk to examine how social influence affects collective decision making. Interestingly, they found that organisation seeking to exploit crowd wisdom in team decision-making should reward independent expression as well as avoid rewarding individuals who have made the right call.
Social Network Centralisation and Collective Intelligence: a randomised experiment
Researchers showed that social network centralisation helps collective intelligence by testing various network structures (from star networks to hub-and-spoke) and how information flows in them.
IC2S2 provided a great platform to meet interesting people, be inspired by their work, share your contributions to the field and receive valuable feedback.
Lastly, the organisers deserve whole bowls of cookies. They did a spectacular job, handling a crowd of 480 and entertaining it with loads of keynotes spread throughout the day, 6 parallel sessions with fascinating talks, a nice reception, boat ride and dinner at a wonderful place.