The manager of Arquivo.pt, Daniel Gomes, reveals some of the steps taken to fulfill the mission of “preserving information published on the Web for scientific and academic purposes”.
Arquivo.pt makes it possible to search and access web pages archived since 1996. The work carried out by this infrastructure is managed by the FCCN Unit of the Foundation for Science and Technology (FCT), focusing on the preservation of information published on the Web for scientific and academic purposes.
This connection to the world of research is also illustrated by the presence of Arquivo.pt in Registry of Research Data Repositories, being used by international researchers as a source of open data. For this reason, the Arquivo.pt has been developing several activities to identify online data related to Research & Development (R&D) projects, so that they can be preserved systematically.
One of the key ways to achieve this goal is the digital preservation of R&D project websites. These websites are increasingly used to provide important scientific information that complements published literature (e.g., datasets or documentation, software, etc.).
However, online information regarding R&D projects has not been exhaustively documented. Information regarding the website addresses of projects funded under the 7th Framework Programme (FP7), for example, made available through European Union Open Data Portal (EU Open Data Portal) is missing for 92% of the projects. In this sense, the Arquivo.pt has automatically identified and preserved more than 52 million files (7 TB) from 53,993 EU-funded R&D project sites since FP4 (1994).
This is also a priority with regard to Portuguese research projects – in total, 600,721 files (72 GB) were preserved, collected from 7,956 websites related to projects funded by the Foundation for Science and Technology.
Other forms of preservation
Since 2020, online information related to FCT-funded projects has been documented in progress and final reports. The goal is to ensure this information is systematically preserved. Arquivo.pt has been conducting special collections aimed at preserving national scientific information available online, cited from open-access scientific publications (RCAAP) and scientific curricula (CIÊNCIAVITAE).
On the other hand, the Arquivo.pt Memorial service has preserved websites of events, projects or scientific portals that are no longer updated, such as Degois.pt. Research and Development unit websites are periodically collected for preservation. These activities primarily aim to maintain the validity of scientific references to online resources in publications. peer-reviewed and academic CVs.
Training for preservation is another strategic path within this mission. To this end, Arquivo.pt has been offering a training program that prepares trainees to publish open data online (so that it can be preserved), preserve data from their research sources online (and self-preserve derived scientific results that are published online), research, access, and reuse historical data from the web, and automatically process large volumes of historical data preserved on the web (through Application Programming Interfaces – APIs).
Preservation and innovation
Likewise, Arquivo.pt has contributed to the production of open-access datasets and software. All software supporting the Arquivo.pt service and the research experiments carried out is available through a GitHub account. Thus, Arquivo.pt provides valuable open data for research, such as historical collection records, temporal searches by text and image (unique in the world), and data preserved since 1996 through proactive web collection and integration of historical collections.
Finally, it's important to highlight the role of the Arquivo.pt Award – an award that, since 2017, has recognized work that uses open data preserved by Arquivo.pt. Over its three years, this award has supported around a dozen innovative projects with diverse scopes and objectives: applications, platforms, browser extensions, academic papers, and scientific research are just a few. examples of the different applications of the data preserved by Arquivo.pt. As a condition of the regulation, these works are made available in open access.