NTNU has been involved in developing a prototype for removing sensitive personal information from the gene databank. Photo: Scanpix

Norwegian COVID-19 raw data now available to share

For the first time, raw data on Norwegian coronavirus genes will be freely available through the open gene bank ENA.

ELIXIR Norway and the Norwegian Institute of Public Health (NIPH) have collaborated on a technical solution that uploads Norwegian viral sequences to the European Nucleotide Archive (ENA) gene bank as well as information on where the samples came from. The solution ensures that sensitive information about specific patients who had the virus is not included.

“In the ongoing pandemic, sharing all the available virus information with health professionals and researchers worldwide as quickly and openly as possible has proved crucial. We’re very pleased that Norwegian viral sequences have now become openly available,” says Nils Peder Willassen, a professor at UiT The Arctic University in Norway. He is also the deputy head of ELIXIR Norway, the Norwegian research infrastructure for bioinformatics and biological data.

Sharing raw data enables scientists to make systematic comparisons of all available SARS-CoV-2 sequences internationally. Willassen believes it is important for Norwegian sequences to be included in this compilation.

NTNU contributes expertise

NTNU’s role in the ELIXIR collaboration is to contribute the university’s technological expertise.

“In this case, we’ve been involved in developing a prototype to remove sensitive personal information from the genetic data. In collaboration with HUNT Cloud at NTNU, we created a filter that removes any traces of personal identity,” says Professor Pål Sætrom at NTNU’s Department of Clinical and Molecular Medicine.

NIPH satisfied

The Norwegian Institute of Public Health is pleased that even more of the Norwegian data will be available.

“We’re very happy to contribute Norwegian virus sequences to the international compilation effort,” says Anna Karin Germundson Hauge. She is the director for NIPH’s Department of Bacteriology.

Hauge points out that so-called consensus sequences from the Norwegian SARS-CoV-2 outbreak have previously been shared with the international community through GISAID, a tool for rapid exchange of outbreak data.

“Through our collaboration with ELIXIR Norway, we can now ensure that all the background material can be shared,” she says.

Open gene bank

ENA (European Nucleotide Archive) is a quality-assured international database that collects and maintains all types of nucleotide sequences. The solution uses NeLS, the Norwegian e-infrastructure for life science data, which was developed by ELIXIR Norway, and TSD (Services for sensitive data), which was developed at USIT (UiO’s Center for Information Technology).

The solution preserves all the information necessary for open reuse of data, while at the same time removing all traces of individual patients who had the virus, so that their privacy is always protected.