Lack of gender diversity in the Artificial Intelligence (AI) workforce is raising growing concerns, but the evidence base about this problem has until now been based on statistics about the workforce of large technology companies or submissions to a small number of prestigious conferences. We build on this literature with a large-scale analysis of gender diversity in AI research using publications from arXiv, a widely-used preprints repository where we have identified AI papers through an expanded keyword analysis, and predicted author gender using a name-to-gender inference service. We study the evolution of gender diversity in various disciplines, countries and institutions, finding that while the share of female co-authors in AI papers is increasing, it has stagnated in disciplines related to computer science. We also find that geography plays an important role in determining the share of female authors in AI papers and that there is a severe gender gap in the top research institutions. We also study the link between female authorship in papers and the citations it receives, finding a strong, positive correlation in research domains related to the impact of information technology on society. Having done this, we examine the semantic differences between AI papers with and without female co-authors. Our results suggest that there are significant differences in machine learning and computer ethics between the United States and the United Kingdom as well as differences in the research focus of papers with female co-authors. We conclude by reporting the results of interviews with female AI researchers and other important stakeholders aimed at interpreting our findings and identifying policies to improve diversity and inclusion in the AI research workforce.
The paper is available on SSRN.