Skip to contents

Key Features

  • Efficient offline conversion of European regional data.

  • Conversion between five NUTS versions: 2006, 2010, 2013, 2016, 2021.

  • Conversion between three regional levels: NUTS-1, NUTS-2, NUTS-3.

  • Ability to convert multiple NUTS versions at once when e.g. NUTS versions differ across countries and years. This scenario is common when working with data sourced from EUROSTAT.

  • (Dasymetric) Spatial interpolation based on five weights (regional area size, 2011 and 2018 population size, 2012 and 2018 built-up area) built from granular [100m x 100m] geodata by the European Commission’s Joint Research Center (JRC).

NUTS Codes

The Nomenclature of Territorial Units for Statistics (NUTS) is a geocode standard for referencing the administrative divisions of European countries. A NUTS code starts with a two-letter combination indicating the country.1 The administrative subdivisions, or levels, are referred to with an additional number or a capital letter (NUTS-1). A second (NUTS-2) or third (NUTS-3) subdivision level is referred to with another digit each.

For example, the German district Northern Saxony (Nordsachsen) is located within the region Leipzig and the federate state Saxony.

  • NUTS-1: States
    • DED: Saxony
  • NUTS-2: States/Government Regions
    • DED5: Leipzig
  • NUTS-3: Districts
    • DED53: Northern Saxony

Since administrative boundaries in Europe change for demographic, economic, political or others reasons, there are five different versions of the NUTS Nomenclature (2006, 2010, 2013, 2016, and 2021). The current version, effective from 1 January 2021, lists 104 regions at NUTS-1, 283 regions at NUTS-2, 1 345 regions at NUTS-3 level2.

Spatial interpolation in a nutshell

When administrative units are restructured, regional data measured within old boundaries can be converted to the new boundaries under reasonable assumptions. The main task of this package is to use (dasymetric) spatial interpolation to accomplish this.

Let’s take the example of the German state Saxony in the figures below. Here, the NUTS-2 regions Leipzig (DED3DED5) and Chemnitz (DED1DED4) were reorganized. We are interested in the number of manure storage facilities in 2003 provided by EUROSTAT based on the 2006 NUTS version. A part of Leipzig was reassigned to Chemnitz (center plot), prompting us to recalculate the number of storage facilities in the 2010 version (right plot).

A simple approach is to redistribute manure storage facilities proportional to the transferred area, assuming equal distribution of manure storages across space. In a dasymetric approach, we could make use of built-up area, assuming that manure deposits are more likely to be found close to residential areas and economic sites. In our example, Leipzig lost about 7.7% (\(\frac{5574}{72360}\)) of its built-up area. We re-calculate the number of manure storage facilities by computing 7.7% of Leipzig’s manure storages \(\frac{5574}{72360} * 700 = 54\), subtracting them from Leipzig and adding them to Chemnitz.

See the Section Spatial interpolation in detail for an in-depth description of the weighting procedure.

Holdings with Manure Storage Facilities; BU = Built-Up Area in $m^{2}$

Holdings with Manure Storage Facilities; BU = Built-Up Area in \(m^{2}\)

Usage

The package comes with three main functions:

  • classify_nuts() detects the NUTS version(s) and level(s) of a data set. Its output can be directly fed into the two other functions.

  • convert_nuts_version() converts your data to a desired NUTS version (2006, 2010, 2013, 2016, 2021). This transformation works in any direction.

  • convert_nuts_level() aggregates data to some upper-level NUTS code, i.e., it transforms NUTS-3 data to the NUTS-2 or NUTS-1 level (but not vice versa).

Workflow

The conversion can only be conducted after classifying the NUTS version(s) and level(s) of your data using the function classify_nuts(). This step ensures the validity and completeness of your NUTS codes before proceeding with the conversion.

Sequential workflow to convert regional NUTS data

Sequential workflow to convert regional NUTS data

Identifying NUTS version and level

The classify_nuts() function’s main purpose is to find the most suitable NUTS version and to identify the level of the data set. Below, you see an example using patent application data (per one million inhabitants) for Norway in 2012 at the NUTS-2 level. This data is provided by EUROSTAT.

# Load package
library(nuts)

# Loading and subsetting Eurostat data
data(patents, package = "nuts")

pat.n2 <- patents %>% filter(nchar(geo) == 4) # NUTS-2 values

pat.n2.mhab.12.no <- pat.n2 %>%
  filter(unit == "P_MHAB") %>% # Patents per one million inhabitants
  filter(time == 2012) %>% # 2012
  filter(str_detect(geo, "^NO")) %>%  # Norway
  select(-unit)

# Classifying the Data
pat.classified <- classify_nuts(data = pat.n2.mhab.12.no,
                                nuts_code = "geo")
## 
## Classifying version of NUTS codes
## ----------------------------------
## => Within groups defined by country.
## ==> No missing NUTS codes.

The function returns a list with three items:

  1. The first item gives the original data set augmented with the columns from_version, from_level, and country, indicating the NUTS version that best suits the data. All functions of the package always group NUTS codes across country names which are automatically generated from the provided NUTS codes.

Below, you see that all data entries correspond to the 2016 NUTS version.

pat.classified[[1]]
## # A tibble: 7 × 6
##   from_code from_version from_level country  time values
##   <chr>     <chr>             <dbl> <chr>   <dbl>  <dbl>
## 1 NO01      2016                  2 Norway   2012  125. 
## 2 NO02      2016                  2 Norway   2012   13.2
## 3 NO03      2016                  2 Norway   2012   57.4
## 4 NO04      2016                  2 Norway   2012  110. 
## 5 NO05      2016                  2 Norway   2012   48.9
## 6 NO06      2016                  2 Norway   2012  145. 
## 7 NO07      2016                  2 Norway   2012   16.5
  1. The second item provides an overview of the share of matching NUTS codes for each of the five existing NUTS versions. The overlap is computed within country and possibly additional groups (if provided via the group_vars argument).
pat.classified[[2]]
## # A tibble: 5 × 3
## # Groups:   country [1]
##   from_version country overlap_perc
##   <chr>        <chr>          <dbl>
## 1 2016         Norway         100  
## 2 2013         Norway         100  
## 3 2010         Norway         100  
## 4 2006         Norway         100  
## 5 2021         Norway          42.9
  1. The third item gives all NUTS codes that are missing across groups. Such missing codes might lead to conversion errors and are, by default, omitted from all conversion procedures. In our example, no NUTS codes are missing.
pat.classified[[3]]
## # A tibble: 0 × 4
## # Groups:   from_version, country [0]
## # ℹ 4 variables: from_code <chr>, from_version <chr>,
## #   from_level <dbl>, country <chr>

Converting data between NUTS versions

Once the NUTS version and level are identified, you can easily convert the data to any other NUTS version. Here is an example of transforming the 2013 Norwegian data to the 2021 NUTS version. Between 2016 and 2021, the number of NUTS-2 regions in Norway decreased by one as the borders of six regions were transformed. The maps below show the affected regions. We provide the classified NUTS data, specify the target NUTS version for data transformation, and supply the variable containing the values to be interpolated. It is important to indicate the variable type in the named input-vector since the interpolation approaches differ for absolute and relative values.

# Converting Data to 2021 NUTS version
pat.converted <- convert_nuts_version(
  data = pat.classified,
  to_version = "2021",
  variables = c("values" = "relative")
)
## 
## Converting versions of NUTS codes
## ---------------------------------
## => Converting NUTS codes in version(s) 2016 to version 2021.
## => All NUTS codes can be converted.
## => Within groups defined by country.
## ==> No missing NUTS codes.

The output below displays the corresponding data frames based on the original and converted NUTS codes. The original data set comprises of seven observations, whereas the converted data set contains six. The regions NO01, NO03, NO04, and NO05 are lost, while NO08, NO09, and NO0A are now listed.

pat.n2.mhab.12.no
## # A tibble: 7 × 3
##   geo    time values
##   <chr> <dbl>  <dbl>
## 1 NO01   2012  125. 
## 2 NO02   2012   13.2
## 3 NO03   2012   57.4
## 4 NO04   2012  110. 
## 5 NO05   2012   48.9
## 6 NO06   2012  145. 
## 7 NO07   2012   16.5
pat.converted
## # A tibble: 6 × 4
##   to_code to_version country values
##   <chr>   <chr>      <chr>    <dbl>
## 1 NO02    2021       Norway    13.2
## 2 NO06    2021       Norway   143. 
## 3 NO07    2021       Norway    16.5
## 4 NO08    2021       Norway    71.0
## 5 NO09    2021       Norway    83.0
## 6 NO0A    2021       Norway    58.9

Converting multiple variables simultaneously

You can also convert multiple variables at once. Below, we add the number of patent applications per 1000 inhabitants as a second variable:

# Converting Multiple Variables
pat.n2.mhab.12.no %>%
  mutate(values_per_thous = values * 1000) %>%
  classify_nuts(
    data = .,
    nuts_code = "geo"
    ) %>%
  convert_nuts_version(
    data = .,
    to_version = "2021",
    variables = c("values" = "relative",
                  "values_per_thous" = "relative")
  )
## 
## Classifying version of NUTS codes
## ----------------------------------
## => Within groups defined by country.
## ==> No missing NUTS codes.
## 
## Converting versions of NUTS codes
## ---------------------------------
## => Converting NUTS codes in version(s) 2016 to version 2021.
## => All NUTS codes can be converted.
## => Within groups defined by country.
## ==> No missing NUTS codes.
## # A tibble: 6 × 5
##   to_code to_version country values values_per_thous
##   <chr>   <chr>      <chr>    <dbl>            <dbl>
## 1 NO02    2021       Norway    13.2           13239 
## 2 NO06    2021       Norway   143.           143106.
## 3 NO07    2021       Norway    16.5           16463 
## 4 NO08    2021       Norway    71.0           71037.
## 5 NO09    2021       Norway    83.0           82964.
## 6 NO0A    2021       Norway    58.9           58904.

Converting grouped data

Longitudinal regional data, as commonly supplied by EUROSTAT, often comes with varying NUTS versions across countries and years (and other dimensions). It is possible to harmonize data across such groups using convert_nuts_version() with the group_vars argument. Below, we transform data within country and year groups for Sweden, Slovenia, and Croatia to the 2021 NUTS version.

# Classifying grouped data (time)
pat.n2.mhab.sesihr <- pat.n2 %>%
    filter(unit == "P_MHAB") %>%
    filter(str_detect(geo, "^SE|^SI|^HR"))

pat.classified <- classify_nuts(nuts_code = "geo", data = pat.n2.mhab.sesihr,
    group_vars = "time")
## 
## Classifying version of NUTS codes
## ----------------------------------
## => Within groups defined by country x time.
## ==> No missing NUTS codes.

Note that the detected best-fitting NUTS versions differ across countries:

pat.classified[[1]] %>%
    group_by(country, from_version) %>%
    tally()
## # A tibble: 3 × 3
## # Groups:   country [3]
##   country  from_version     n
##   <chr>    <chr>        <int>
## 1 Croatia  2016            24
## 2 Slovenia 2010            26
## 3 Sweden   2021           104

The grouping is stored and passed on to the conversion function:

# Converting grouped data (Time)
pat.converted <- convert_nuts_version(
  data = pat.classified,
  to_version = "2021",
  variables = c("values" = "relative")
)
## 
## Converting versions of NUTS codes
## ---------------------------------
## => Converting NUTS codes in version(s) 2010, 2016, 2021 to version 2021.
## => All NUTS codes can be converted.
## => Within groups defined by country x time.
## ==> No missing NUTS codes.

Conveniently, the group argument can also be used to transform higher dimensional data. Below, we include two indicators for patent applications to convert data that varies at the indicator-year-country-NUTS code level.

# Classifying and converting multi-group data
pat.n2.mhabmact.12.sesihr <- pat.n2 %>%
  filter(unit %in% c("P_MHAB", "P_MACT")) %>%
  filter(str_detect(geo, "^SE|^SI|^HR"))

pat.converted <- pat.n2.mhabmact.12.sesihr %>%
  classify_nuts(
    data = .,
    nuts_code = "geo",
    group_vars = c("time", "unit")
  ) %>%
  convert_nuts_version(
    data = .,
    to_version = "2021",
    variables = c("values" = "relative")
  )
## 
## Classifying version of NUTS codes
## ----------------------------------
## => Within groups defined by country x time x unit.
## ==> No missing NUTS codes.
## 
## Converting versions of NUTS codes
## ---------------------------------
## => Converting NUTS codes in version(s) 2010, 2016, 2021 to version 2021.
## => All NUTS codes can be converted.
## => Within groups defined by country x time x unit.
## ==> No missing NUTS codes.

Converting data between NUTS levels

The convert_nuts_level() function facilitates the aggregation of data from lower NUTS levels to higher ones using spatial weights. This enables users to summarize variables upward from the NUTS-3 level to NUTS-2 or NUTS-1 levels. It is important to note that this function does not support disaggregation since this comes with strong assumptions about the spatial distribution of a variable’s values.

In the following example, we illustrate how to aggregate the total number of patent applications in Sweden from NUTS-3 to higher levels. The functions below return a warning concerning non-identifiable NUTS codes. See Non-identified NUTS codes for further information.

data("patents", package = "nuts")
# Aggregating data from NUTS-3 to NUTS-2 and NUTS-1
pat.n3 <- patents %>% filter(nchar(geo) == 5)

pat.n3.nr.12.se <- pat.n3 %>%
  filter(unit %in% c("NR")) %>%
  filter(time == 2012) %>%
  filter(str_detect(geo, "^SE"))

pat.classified <- classify_nuts(data = pat.n3.nr.12.se,
                                nuts_code = "geo")
## 
## Classifying version of NUTS codes
## ----------------------------------
## => These NUTS codes cannot be identified or classified: SEXXX, SEZZZ.
## => Within groups defined by country.
## ==> No missing NUTS codes.
pat.level2 <- convert_nuts_level(
  data = pat.classified,
  to_level = 2,
  variables = c("values" = "absolute")
)
## 
## Converting level of NUTS codes
## ------------------------------
## => Aggregate from NUTS regional level 3 to 2.
## => These NUTS codes cannot be converted and are dropped from the dataset: SEXXX, SEZZZ.
## => Within groups defined by country.
## ==> No missing NUTS codes.
pat.level1 <- convert_nuts_level(
  data = pat.classified,
  to_level = 1,
  variables = c("values" = "absolute")
)
## 
## Converting level of NUTS codes
## ------------------------------
## => Aggregate from NUTS regional level 3 to 1.
## => These NUTS codes cannot be converted and are dropped from the dataset: SEXXX, SEZZZ.
## => Within groups defined by country.
## ==> No missing NUTS codes.
Aggregating patents from NUTS 3 to NUTS 2 and NUTS 1

Aggregating patents from NUTS 3 to NUTS 2 and NUTS 1

Inconsistent versions and levels

Non-identified NUTS codes

If the input data contains NUTS codes that cannot be identified in any NUTS version, the output of classifiy_nuts lists all of these codes. All conversion procedures (convert_nuts_version() and convert_nuts_level()) will work as expected while ignoring values for these regions.

The example below classifies 2012 patent data from Denmark. The original EUROSTAT data contains the codes DKZZZ and DKXXX, which are not part of the conversion matrices. Codes ending with the letter Z refer to “Extra-Regio” territories. These codes collect statistics for territories that cannot be attached to a certain region.3 Codes ending with the letter X refer to observations with unknown regions.

pat.n3.nr.12.dk <- pat.n3 %>%
  filter(unit %in% c("NR")) %>%
  filter(time == 2012) %>%
  filter(str_detect(geo, "^DK"))

pat.classified <- classify_nuts(data = pat.n3.nr.12.dk, nuts_code = "geo")
## 
## Classifying version of NUTS codes
## ----------------------------------
## => These NUTS codes cannot be identified or classified: DKXXX, DKZZZ.
## => Within groups defined by country.
## ==> No missing NUTS codes.

Missing NUTS codes

classify_nuts() also checks whether the NUTS codes provided are complete. Missing values in the input data will, by default, result in missing values for all affected transformed regions in the output data.

The example below illustrates this case.

pat.n3.nr.12.si <- pat.n3 %>%
  filter(unit %in% c("NR")) %>%
  filter(time == 2012) %>%
  filter(str_detect(geo, "^SI"))

pat.classified <- classify_nuts(data = pat.n3.nr.12.si, nuts_code = "geo")
## 
## Classifying version of NUTS codes
## ----------------------------------
## => These NUTS codes cannot be identified or classified: SIXXX, SIZZZ.
## => Within groups defined by country.
## ==> Missing NUTS codes detected. See the tibble 'Missing NUTS codes...' in the output.

classify_nuts() returns a warning that NUTS codes are missing in the input data. These codes can be inspected by calling pat.classified[3].

pat.classified[3]
## $`Missing NUTS codes within from_version x country groups`
## # A tibble: 2 × 4
## # Groups:   from_version, country [1]
##   from_code from_version from_level country 
##   <chr>     <chr>             <dbl> <chr>   
## 1 SI011     2010                  3 Slovenia
## 2 SI016     2010                  3 Slovenia

The resulting conversion returns three missing values as the source code SI011 transformed into SI031 and the region SI016 was split into SI036 and SI037.

convert_nuts_version(data = pat.classified, 
                     to_version = "2021", 
                     variables = c("values" = "absolute")) %>% 
  filter(is.na(values))
## 
## Converting versions of NUTS codes
## ---------------------------------
## => Converting NUTS codes in version(s) 2010 to version 2021.
## => These NUTS codes cannot be converted and are dropped from the dataset: SIXXX, SIZZZ.
## => Within groups defined by country.
## ==> Missing NUTS codes in data. No values are calculated for regions associated with missing NUTS codes. Ensure that the input data is complete.
## # A tibble: 3 × 4
##   to_code to_version country  values
##   <chr>   <chr>      <chr>     <dbl>
## 1 SI031   2021       Slovenia     NA
## 2 SI036   2021       Slovenia     NA
## 3 SI037   2021       Slovenia     NA

Users have three options to overcome this problem.

  1. The warning can be ignored and the conversion proceeds while returning NAs (see above).

  2. Users check whether the missing values can be replaced by, e.g., using alternative sources or imputing missing values.

  3. The argument missing_rm can be set to TRUE. In this case, missing values will be removed from the input data. Effectively, the interpolation procedures assume that missing values can be replaced with 0, which may be a very strong assumption depending on the context.

convert_nuts_version(
  data = pat.classified, 
  to_version = "2021", 
  variables = c("values" = "absolute"),
  missing_rm = TRUE
  ) %>% 
  filter(to_code %in% c("SI031", "SI036", "SI037"))
## 
## Converting versions of NUTS codes
## ---------------------------------
## => Converting NUTS codes in version(s) 2010 to version 2021.
## => These NUTS codes cannot be converted and are dropped from the dataset: SIXXX, SIZZZ.
## => Within groups defined by country.
## ==> Missing NUTS codes in data. No values are calculated for regions associated with missing NUTS codes. Ensure that the input data is complete.
## # A tibble: 3 × 4
##   to_code to_version country  values
##   <chr>   <chr>      <chr>     <dbl>
## 1 SI031   2021       Slovenia  0    
## 2 SI036   2021       Slovenia  0.544
## 3 SI037   2021       Slovenia  4.02

Multiple NUTS levels within groups

The package does not allow for the conversion of multiple NUTS levels at once. The classification function will throw an error in this case. The conversion needs to be conducted for every level separately.

patents %>% 
  filter(nchar(geo) %in% c(4, 5), grepl("^EL", geo)) %>% 
  distinct(geo, .keep_all = T) %>% 
  classify_nuts(nuts_code = "geo", data = .)
## Error in classify_nuts(nuts_code = "geo", data = .): 
## Data contains NUTS codes from multiple levels (2 and 3). => Please classify different levels separately.

Multiple NUTS versions within groups

Converting multiple NUTS versions within groups might lead to erroneous spatial interpolations since overlaps between regions of different versions are possible.

The example below illustrates this problem. We classify German and Italian manure storage facility data from EUROSTAT without specifying group_vars. Instead, we keep all unique NUTS codes to artificially create a data set containing different NUTS versions. classify_nuts() returns a warning and by inspecting the identified versions, we see that there are mixed versions within groups (the countries).

man.deit <- manure %>% 
  filter(grepl("^DE|^IT", geo)) %>%
  filter(nchar(geo) == 4, ) %>% 
  distinct(geo, .keep_all = T) %>% 
  classify_nuts(nuts_code = "geo", data = .)
## 
## Classifying version of NUTS codes
## ----------------------------------
## => These NUTS codes cannot be identified or classified: DEZZ.
## => Within groups defined by country.
## ==> Classified multiple NUTS code versions. See the tibble 'Overlap of each NUTS version...' in the output.
## ==> Missing NUTS codes detected. See the tibble 'Missing NUTS codes...' in the output.
man.deit[[1]] %>% group_by(country, from_version) %>% tally()
## # A tibble: 5 × 3
## # Groups:   country [3]
##   country from_version     n
##   <chr>   <chr>        <int>
## 1 Germany 2006            38
## 2 Germany 2021             3
## 3 Italy   2006             9
## 4 Italy   2021            21
## 5 NA      NA               1

When proceeding to the conversion with either convert_nuts_version() or convert_nuts_level(), both functions will throw an error. For convenience, we added the option multiple_versions that subsets the supplied data to the dominant version within groups when specified with most_frequent. Hence, all codes from other, non-dominant versions are discarded.

Once we convert this data set, all NUTS regions unrecognized acoording to the 2006 (Germany) and 2021 (Italy) version are dropped automatically.

man.deit.converted <- convert_nuts_version(
  data = man.deit,
  to_version = 2021,
  variables = c("values" = "relative"),
  multiple_versions = "most_frequent"
)
## 
## Converting versions of NUTS codes
## ---------------------------------
## => Converting NUTS codes in version(s) 2006, 2021 to version 2021.
## => These NUTS codes cannot be converted and are dropped from the dataset: DEZZ.
## => Within groups defined by country.
## ==> Multiple NUTS code versions. Choosing most frequent version within group and dropping 12 row(s).
## ==> Missing NUTS codes in data. No values are calculated for regions associated with missing NUTS codes. Ensure that the input data is complete.
man.deit.converted %>% group_by(country, to_version) %>% tally()
## # A tibble: 2 × 3
## # Groups:   country [2]
##   country to_version     n
##   <chr>        <dbl> <int>
## 1 Germany       2021    38
## 2 Italy         2021    21

Spatial interpolation in detail

This section describes the spatial interpolation procedure. We first cover the logic of conversion tables and then explain the methods used in the package for converting versions and levels.

Changes in administrative boundaries

Below, Norwegian NUTS-2 regions for the versions 2016 and 2021 are shown. All regions apart from Norway’s most Northern region have been reorganized in this period.

Norwegian NUTS-2 regions with boundary changes

Norwegian NUTS-2 regions with boundary changes

The changes between the two versions can be summarized as follows:

  1. Boundary changes of regions with continued NUTS codes
  • NO02 ceases a small area to the new NO08
  • NO06 makes small area gains from NO05
  1. Changes to regions with discontinued NUTS codes
  • NO01 is absorbed by NO08
  • NO03 is split up between NO08 and NO09
  • NO04 divides into NO0A and NO09
  • NO05 largely becomes the new NO0A, and gives a small area to NO06

Spatial interpolation and conversion tables

To keep track of these changes, the nuts package uses regional flows between different NUTS versions. The package ships with a conversion table that can be called with data(cross_walks) based on data provided by the JRC. It documents all boundary changes of NUTS regions.

For Norway going from version 2016 to 2021, the table looks as follows:

from_code to_code from_version to_version level country areaKm pop18 pop11 artif_surf18 artif_surf12
NO01 NO08 2016 2021 2 Norway 5365.0 1268387.7 1131221.0 58104 55927
NO02 NO02 2016 2021 2 Norway 52072.3 370392.3 362070.7 60625 54887
NO02 NO08 2016 2021 2 Norway 517.6 15843.5 15019.0 1952 1813
NO03 NO08 2016 2021 2 Norway 19123.5 575350.1 535560.5 66876 62509
NO03 NO09 2016 2021 2 Norway 17414.4 403640.4 385648.8 50076 47799
NO04 NO09 2016 2021 2 Norway 16360.8 292218.9 272016.3 44779 42346
NO04 NO0A 2016 2021 2 Norway 9326.0 451949.5 416975.4 39112 36432
NO05 NO06 2016 2021 2 Norway 931.8 3510.2 3625.9 869 832
NO05 NO0A 2016 2021 2 Norway 47902.2 837246.8 790090.1 99951 94757
NO06 NO06 2016 2021 2 Norway 41029.0 447774.0 417827.7 47291 43630
NO07 NO07 2016 2021 2 Norway 112453.1 452720.0 437265.2 81907 79098

In addition to tracing the evolution of NUTS codes, the table contains flows of area, population and artificial surfaces between regions and versions. These flows were computed by the JRC with granular [100m x 100m] geographic data. The ggalluvial plot below visualizes the flows of area size between the NUTS-2 regions mapped above.

Alluvial plot illustrating area size flows

Alluvial plot illustrating area size flows

To illustrate the main idea, the map below showcases population densities across NUTS-2 regions. As population is not uniformly distributed across space, weighting regions dependent on their size might come with strong assumptions. For instance, region NO01 in version 2016, that contains the city of Oslo, makes a relatively modest geographical contribution to the new region NO08, but significantly bolsters the population of the latter. Assuming that the variable to be converted is correlated with population across space, the conversion can thus be refined using population weights to account for flows between different versions.

Spatial distribution of population and boundary changes

Spatial distribution of population and boundary changes

Conversion methods

The following subsections describe the method used to convert absolute and relative values between versions and levels.

Conversion of absolute values between versions

In this example, we transform absolute values, the number of patent applications (NR) in Norway, from version 2016 to 2021, utilizing spatial interpolation based on the population distribution in 2018.

The conversion employs the cross_walks table, which includes population flow data (expressed in thousands) between two NUTS-2 regions from the source version to the target version. The function joins the our variable of interest, NR, which varies across the departing NUTS-2 codes (from_code). The function initially calculates a weight (w) equal to the population flow’s share of the total population in the departing region in version 2016 (from_code):

from_code to_code from_version to_version NR pop18 w
NO01 NO08 2016 2021 146 1268 1268/(1268) = 1
NO02 NO02 2016 2021 5 370 370/(370 + 15) = 0.96
NO02 NO08 2016 2021 5 15 15/(370 + 15) = 0.04
NO03 NO08 2016 2021 54 575 575/(575 + 403) = 0.59
NO03 NO09 2016 2021 54 403 403/(575 + 403) = 0.41
NO04 NO09 2016 2021 80 292 292/(292 + 451) = 0.39
NO04 NO0A 2016 2021 80 451 451/(292 + 451) = 0.61
NO05 NO06 2016 2021 41 3 3/(3 + 837) = 0
NO05 NO0A 2016 2021 41 837 837/(3 + 837) = 1
NO06 NO06 2016 2021 62 447 447/(447) = 1
NO07 NO07 2016 2021 7 452 452/(452) = 1

To obtain the number of patent applications at the desired 2021 version, the function summarizes the data for the new NUTS regions in version 2021 (to_code) by taking the population-weighted sum of all flows.

to_code to_version NR
NO02 2021 5*0.96 = 4.8
NO06 2021 410 + 621 = 62
NO07 2021 7*1 = 7
NO08 2021 1461 + 50.04 + 54*0.59 = 178.06
NO09 2021 540.41 + 800.39 = 53.34
NO0A 2021 800.61 + 411 = 89.8

Conversion of relative values between versions

To convert relative values, such as the number of patent applications per 1000 inhabitants, convert_nuts_version() departs again from the conversion table seen above. We focus on the variable P_MHAB, patent applications per one million inhabitants. The function summarizes these relative values by computing the weighted average with respect to 2018 population flows.

to_code to_version P_MHAB
NO02 2021 (370*13)/(370) = 13
NO06 2021 (348 + 447145)/(3 + 447) = 144
NO07 2021 (452*16)/(452) = 16
NO08 2021 (1268125 + 1513 + 575*57)/(1268 + 15 + 575) = 103
NO09 2021 (40357 + 292110)/(403 + 292) = 79
NO0A 2021 (451110 + 83748)/(451 + 837) = 70

Conversion of absolute values between NUTS levels

The function convert_nuts_level() aggregates from lower to higher order levels, e.g. from NUTS-3 to NUTS-2. Since higher order regions are perfectly split into lower order regions in the NUTS system, the function takes simply the sum of the values in case of absolute variables.

Conversion of relative values between NUTS levels

Relative values are aggregated in convert_nuts_level() by computing the weighted mean of all lower order regional levels. To convert, for example, the number of patent applications per one million inhabitants from NUTS-3 to NUTS-2, the function adds the population size in 2018.

nuts_3 nuts_2 pop18 P_MHAB
NO011 NO01 662 145
NO012 NO01 606 102
NO021 NO02 196 7
NO022 NO02 188 18
NO031 NO03 289 34
NO032 NO03 279 45
NO033 NO03 239 106
NO034 NO03 169 45
NO041 NO04 113 43
NO042 NO04 178 50
NO043 NO04 451 150
NO051 NO05 495 24
NO052 NO05 102 58
NO053 NO05 241 91
NO061 NO06 307 208
NO062 NO06 139 3
NO071 NO07 225 10
NO072 NO07 154 33

The number of patent applications at the NUTS-2 level is computed by the weighted average using NUTS-3 population numbers.

nuts_2 P_MHAB
NO01 (662145 + 606102)/(662 + 606) = 124
NO02 (1967 + 18818)/(196 + 188) = 12
NO03 (28934 + 27945 + 239106 + 16945)/(289 + 279 + 239 + 169) = 56
NO04 (11343 + 17850 + 451*150)/(113 + 178 + 451) = 109
NO05 (49524 + 10258 + 241*91)/(495 + 102 + 241) = 47
NO06 (307208 + 1393)/(307 + 139) = 144
NO07 (22510 + 15433)/(225 + 154) = 19

Citation

Please support the development of open science by citing the JRC and us in your work.

Bibtex Users:

@Manual{,
  title = {NUTS converter},
  author = {Joint Research Centre},
  year = {2022},
  url = {https://urban.jrc.ec.europa.eu/nutsconverter},
}

@Manual{,
  title = {nuts: Convert European Regional Data},
  author = {Moritz Hennicke and Werner Krause},
  year = {2024},
  note = {R package version 0.0.0.9000},
  url = {https://AAoritz.github.io/nuts/},
}