fotografia de
Daniel Gomes
Service Manager

In 2021, almost all information is produced, disseminated and consumed in digital format. This is the information that governs every second of our societies. However, when we look at the various existing digital preservation initiatives, they focus on digitizing collections that were not originally digital, for example, digitizing printed books. Most digital information published online is lost.

Recent events have reinforced this trend. About a year and a half after the start of the Covid-19 pandemic, we see how this event generated an immense amount of information (and misinformation) that was produced, replicated, adulterated, disseminated, and which, whether we agree with it or not, influenced the taking of social, political and economic measures. 

It should be noted that many of the large torrents of information that currently influence political decision-making processes in democracies are informal, being conveyed through online channels such as social networks. This information appears and influences as quickly as it disappears and becomes inaccessible. (except for the multinational companies that own the platforms, which are aware of the value of preserving this information). Without the memory of the initial information (and misinformation) that proliferated online about SARS-CoV-2, what lessons can governments and citizens learn? What history can be written about other contemporary events without the memory of the online? 

There is official digital information that is carefully preserved. Examples include publications in the Electronic Official Gazette or the Authentic Digital Objects preserved in the RODA of the Directorate-General for Books, Archives and Libraries. However, these official communications document the effect of events and are unlikely to be sufficient, on their own, to analyze the causes of a phenomenon., learning lessons to better react to similar situations in the future. 

Online files vs. archives of online

What is meant by an "online archive"? Examples like the ones above are excellent "online archives" that can continue to evolve through the conventional adaptation of legislation and technology. After all, the importance of preserving the Decrees of Law of a Republic is undeniable. My concern is mainly related to the “archives of online”, since there is not yet an established awareness of its need, whether at an academic, governmental or individual level. 

That "Information is power" is an accepted truth. Modern organizations communicate strategically, sharing information through their online channels such as websites or social media. But how many organizations are aware of the value of preserving their information online? How many are aware of the risk of losing that information? How many professors in various scientific fields alert their students to the importance of preserving information online or the impacts of losing it? If information is power, then losing information is losing power. 

It's technologically impossible to preserve all online information. But it's absurd not to be aware that we need to preserve some online information for short-, medium-, and long-term access (and consequently act accordingly). After the arrival of the Information Age, which solved the problem of access to information, archives must contribute to combating the current Age of Disinformation. The role of archives of Online communication is crucial in this fight because analyzing information from multiple sources over time helps identify inconsistencies or assign credibility. The greater the volume of information, the more possibilities there are for assessing the veracity of a piece of information. 

The advantage of files of online is that information, once it is born digital and is quickly available, can be processed automatically and in multiple ways. But it is necessary to create a new type of institutions to carry out the archive of online because it is a task with very specific challenges that require specialists and adequate resources. 

The cost of not preserving digital information born online will be enormous for future generations because it will be impossible for them to learn from past mistakes. In this sense, the main challenge of archives of online is being able to make the world feel that they are needed today.

File of online: difficult but not impossible

Technically, most of the content we consume online is served via the HTTP (or HTTPS) protocol, meaning it is content from Web. However, about 80% of the content available on the Web is changed or disappears after just one year.

The Internet Archive is a US non-profit organization that archives web content worldwide. However, it is difficult for a single organization to create an exhaustive archive of all published content because the Web is constantly changing and much information disappears before it can be archived. 

Furthermore, documenting historical events of national significance to a given country is not a priority for the Internet Archive, and much of the information published, for example, on the Portuguese web is irretrievably lost. This problem is also felt by other national communities, so There are already at least 93 web archiving initiatives around the world.

In Portugal, the Arquivo.pt is an example of a file of online that allows you to search and access archived web pages since 1996. This is a public service managed by the Foundation for Science and Technology and accessible to any citizen. Arquivo.pt stands out for offering a search service for pages and images from the past. It's a kind of Google, but for the past of the web. 

The system that supports Arquivo.pt periodically collects and stores information published on the web. It then processes this information to make it searchable and accessible. This preservation process is performed automatically through a large-scale distributed computer system. The search and access service can be used automatically through Application Programming Interfaces (APIs) to develop innovative applications that take advantage of archived information. 

Arquivo.pt provides a free preservation service to web authors and at the same time a valuable research resource that has already been used by researchers, for example, to automatically measure the accessibility of the Portuguese web for people with disabilities. The Arquivo.pt Award annually recognizes works that use information preserved by Arquivo.pt. The ten award-winning works to date are real examples of the social and scientific potential of archives of online is huge and has only just begun to be explored.

Arquivo.pt holds more than 10 billion archived files (700 TB). However, the biggest challenge isn't disk space to store this information. The challenge is keeping this information searchable and accessible in a timely manner, which means, these days, providing users with answers within seconds and suitable for any device. The second challenge is recruiting and training specialized human resources. How to archive online data isn't yet taught in universities, so ongoing training for new team members is required. 

The third, and most unexpected challenge for me, is the difficulty in spreading the word about the service's existence. I hope I've been able to argue so far that the files of Online resources are essential. Arquivo.pt has been publicly available since 2010. How long have you known about it? We live in an attention economy. Human attention has become a scarce commodity, for which the world's most powerful companies compete fiercely with each other using nearly unlimited resources and ethically questionable strategies. In the online world, which is Arquivo.pt's home, this will be the major short-term challenge: manage to capture attention so that this public service can be useful to more people.

Other related articles