Miguel Ramalho was the winner of the Arquivo.pt 2020 Award, for his work Desarquivo - a platform that analyses news preserved by Arquivo.pt to establish relations between entities, people and places. The graduate of the Faculty of Engineering of the University of Porto (FEUP) explains the usefulness of this platform, describing the context that led to its creation.
The Desarquivo project was awarded the 1st Prize in the Arquivo.pt 2020 Award. Can you explain us what this solution consists of?
The name Unarchive is partially self-explanatory. It is, at least, indicative of what it does. It is a project that analyses the text of archived news stories over the last 20 years, identifies references to people, organisations, places and other entities, creates a network of relations between these entities based on their mutual presence in a body of news and, finally, allows you to search this network, let's call it a graph, in a visual and intuitive way. It starts from archived news, and ends with a graph, an unarchive, of the connections between these people, organizations, places and more.
What are some of the applications of this tool that you see as especially relevant?
The app I initially envisioned is investigative journalism. That's precisely why the focus was on creating a new way to find hidden things. In this case, those things are connections. Connections that range from finding out that two politicians were at the same event in 2004, to understanding which companies interact most with a given university. Beyond investigative journalism, Unarchive can be used for personal exploration or learning and fact checking. These are applications that allow learning or answering questions and that have a different nature than what we are used to in conventional search engines. Of course, Desarquivo uses sources of information that most search engines do not use, or to which they do not have access, because they are pages that are no longer present on the web.
Disinformation is seen as one of the defining trends of the 21st century, and the way it negatively impacts our society is often highlighted. Do you believe Desarquivo can help respond to this phenomenon? In what ways?
The problem with misinformation lies in the generalized laziness that we humans have, it is so much easier to scroll and continue reading a feed than to investigate each news, fact, statistic or comment we come across. In this sense, I'm not so naive as to think that the Archive Archive will change consumption patterns and eliminate 21st century idleness. However, I see it having an impact complementing that journalism that is actually based on facts and reliable sources of information and, in that sense, being another tool that helps the truth become known, even if it then has to battle for attention with lies. I have also been maintaining another project aimed at understanding this problem in the social networks in Portugal, which focuses, for now, only on Twitter: Election Watch.
In what context did the basic idea for the construction of Desarquivo arise? And what led you to participate in the Arquivo.pt 2020 Prize call?
The context was precisely a journalistic investigation project by the ICIJ (International Consortium of Investigative Journalists) - the Luanda Leaks - whose documents were later revealed to be the responsibility of the Portuguese Rui Pinto. Following this episode, I remember reading an article that explained some of the challenges and approaches used to analyse such a large amount of data and files. Then, and because I had already heard about the Arquivo.pt award, I joined the useful to the pleasant and tried to take a similar amount of data, in this case oriented to a broader context: the Portuguese news content and how it reflects our country. I saw this project very much as a challenge to do something simultaneously relevant and innovative. Whether it worked or not, I think time will tell.
Looking to the near future, what do you think the impact of this distinction will be?
I think I have already experienced a considerable part of that impact, which has been the opportunity to communicate with journalists and share ideas, to look at the project from a more technical perspective, or even to be able to explain it to audiences and discuss it with people from different backgrounds than myself. I think there may soon also be some academic interest in improving individual parts of the Desarquivo, either by me or by university students or researchers.