hugo-ox-mpvd/#publicaciones.org#

#+STARTUP: noindent content inlineimages
#+HUGO_BASE_DIR: ~/Documentos/curro/master-uah/web/starter-academic
#+HUGO_SECTION: publication
#+HUGO_FRONT_MATTER_FORMAT: toml
#+SEQ_TODO: TODO BORRADOR REVISAR PUBLICADO
#+COLUMNS: %TODO %42ITEM %TAGS
#+OPTIONS:   H:3 num:nil toc:nil \n:nil @:t ::t |:t ^:nil -:t f:t *:t <:t
#+OPTIONS:   TeX:t LaTeX:nil skip:nil d:nil todo:t pri:nil tags:not-in-toc
#+OPTIONS: author:nil email:nil creator:nil timestamp:nil
#+OPTIONS: *:t <:nil ^:nil timestamp:nil
#+HUGO_AUTO_SET_LASTMOD: t
#+DATE: <2021-01-09 Sat>
#+Title: Artículos
#+author: admin

* PUBLICADO Artículos
  :PROPERTIES:
  :EXPORT_TITLE: Investigación reproducible
  :EXPORT_FILE_NAME: _index
  :END:

#+begin_src toml :front_matter_extra t
subtitle = "y programación literaria"
view = 2
slug = "publicacion"
pagecat = "publicaciones"
description = "Listado de publicaciones"
url = "/publicacion/"
layout = "publication"
[header]
focal_point = "Right"

#+end_src

** Conf                                                            :noexport:

#+begin_src toml
+++
title = "Publicaciones"

# View.
#   1 = List
#   2 = Compact
#   3 = Card
view = 2

# Optional featured image (relative to `static/img/` folder).
[header]
image = ""
caption = ""
+++

#+end_src

* PUBLICADO Streetmaps
  :PROPERTIES:
  :EXPORT_TITLE: Streetmaps
  :EXPORT_DATE: <2021-01-09 Sat>
  :EXPORT_FILE_NAME: index
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER: :subtitle "Cómo hacer bonitos callejeros"
  :EXPORT_HUGO_SECTION*: streetmaps-callejeros
  :EXPORT_AUTHOR: adolfo_anton
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :math true
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :diagram true
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :publication_types ["0"]
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :publication "MPVD"
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :featured true
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :projects '("periodismodatos")
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :abstract_short "Cómo hacer bonitos callejeros con R"
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :authors '("adolfo_anton")
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :abstract "Cómo hacer bonitos callejeros con R y ggplot2"
  :END:

#+begin_src toml :front_matter_extra t

  [image]
   caption = "[Orion aveugle cherchant le soleil](https://commons.wikimedia.org/wiki/File:Orion_aveugle_cherchant_le_soleil.jpg), Nicolas Poussin  / Public domain"
   focal_point = "BottomLeft"
   placement = 3
   preview_only = false
#+end_src

El título original es mucho más feo ya que incluye al final "y PowerPoint": "Create a streetmap of your favorite city with ggplot2 and powerpoint". En este caso vamos a intentar no usar esa herramienta por no ser software libre.

El propósito del autor era crear mapas/callejeros como el de este [[https://www.mapiful.com/de/editor/#!/location/?startFrom=332036][cartel]]:

#+CAPTION: Típica imagen de cartel

Para este tutorial se utilizan dos paquetes de R: [[https://github.com/ropensci/osmdata][osmdata]] y [[https://ggplot2.tidyverse.org/][ggplot2]]. Con =osmdata= se extraen las calles de [[https://www.openstreetmap.org/][OpenStreetMap]], una base de datos libre con licencia abierta. Ni siquiera requiere de una clave para trabajar con su API. Para crear las visualizaciones y trabajar con los datos se utiliza =tidyverse=.

El autor no había trabajado mucho con =ggplot2= y se inspiró especialmente en [[https://dominicroye.github.io/en/2018/accessing-openstreetmap-data-with-r/][Dominic Royé]] y su detallado trabajo de datos OpenStreetMap con R.

Se puede empezar por la ciudad en la que estás, en la que vives, en la que naciste o una que te gusta. En su caso empezó por Freiburg, en el sur de Alemania. Para elegir la ciudad tan solo hay que ajustar la longitud y latitud de los puntos de inicio.

*** Instalar librerías

Lo primero es instalar las librerías necesarias, =tidyverse= y =osmdata=:

#+begin_src R :session murciaosm
library(tidyverse)
library(osmdata)


#+end_src

#+RESULTS:

*** OpenStreetMap

Antes de trabajar con R hay que entender cómo almacena [[https://wiki.openstreetmap.org/wiki/Map_Features][OpenStreetMap]] los datos de los mapas y, por tanto, las calles.

OpenStreetMap describe las cosas físicas como "características" o "rasgos", del inglés *features*. En este caso son =map_features= Estas características se almacenan como *pares de clave-valor*.

Por ejemplo, hay [[https://wiki.openstreetmap.org/wiki/Map_Features#Highway][highways]] que son vías principales que pueden conectar tanto la ciudad consigo misma como una ciudad con otras.

Para obtener todas las etiquetas de una característica espacial vía osmdata introducimos la función =available_tags()=

#+begin_src R :session murciaosm :results output
available_tags("highway")
#+end_src

#+RESULTS:
#+begin_example
 [1] "bridleway"              "bus_guideway"           "bus_stop"
 [4] "construction"           "corridor"               "crossing"
 [7] "cycleway"               "elevator"               "emergency_access_point"
[10] "emergency_bay"          "escape"                 "footway"
[13] "give_way"               "living_street"          "milestone"
[16] "mini_roundabout"        "motorway"               "motorway_junction"
[19] "motorway_link"          "passing_place"          "path"
[22] "pedestrian"             "platform"               "primary"
[25] "primary_link"           "proposed"               "raceway"
[28] "residential"            "rest_area"              "road"
[31] "secondary"              "secondary_link"         "service"
[34] "services"               "speed_camera"           "steps"
[37] "stop"                   "street_lamp"            "tertiary"
[40] "tertiary_link"          "toll_gantry"            "track"
[43] "traffic_mirror"         "traffic_signals"        "trailhead"
[46] "trunk"                  "trunk_link"             "turning_circle"
[49] "turning_loop"           "unclassified"
#+end_example

También hay vías de agua =waterway=, ya sean ríos, canales, acequias o tuberías. Se hace la misma consulta:

#+begin_src R :session murciaosm :results output
available_tags("waterway")
#+end_src

#+RESULTS:
:  [1] "boatyard"      "canal"         "dam"           "ditch"
:  [5] "dock"          "drain"         "fairway"       "fuel"
:  [9] "lock_gate"     "pressurised"   "river"         "riverbank"
: [13] "stream"        "tidal_channel" "turning_point" "water_point"
: [17] "waterfall"     "weir"

**** Obtener taxonomía de =map_features=
Se pueden consultar las taxonomías en su [[https://wiki.openstreetmap.org/wiki/Map_features][página web de map_features]] o bien con la función =available_features()=:

#+begin_src R :session murciaosm :results output
available_features()

#+end_src

#+RESULTS:
#+begin_example
  [1] "4wd_only"                "abandoned"
  [3] "abutters"                "access"
  [5] "addr"                    "addr:city"
  [7] "addr:conscriptionnumber" "addr:country"
  [9] "addr:district"           "addr:flats"
 [11] "addr:full"               "addr:hamlet"
 [13] "addr:housename"          "addr:housenumber"
 [15] "addr:inclusion"          "addr:interpolation"
 [17] "addr:place"              "addr:postcode"
 [19] "addr:province"           "addr:state"
 [21] "addr:street"             "addr:subdistrict"
 [23] "addr:suburb"             "admin_level"
 [25] "aeroway"                 "agricultural"
 [27] "alt_name"                "amenity"
 [29] "area"                    "atv"
 [31] "backward"                "barrier"
 [33] "basin"                   "bdouble"
 [35] "bicycle"                 "bicycle_road"
 [37] "biergarten"              "boat"
 [39] "border_type"             "boundary"
 [41] "bridge"                  "building"
 [43] "building:fireproof"      "building:flats"
 [45] "building:levels"         "building:min_level"
 [47] "building:soft_storey"    "bus_bay"
 [49] "busway"                  "charge"
 [51] "construction"            "covered"
 [53] "craft"                   "crossing"
 [55] "crossing:island"         "cuisine"
 [57] "cutting"                 "cycleway"
 [59] "denomination"            "destination"
 [61] "diet"                    "direction"
 [63] "dispensing"              "disused"
 [65] "disused:shop"            "drink"
 [67] "drive_in"                "drive_through"
 [69] "ele"                     "electric_bicycle"
 [71] "electrified"             "embankment"
 [73] "embedded_rails"          "emergency"
 [75] "end_date"                "entrance"
 [77] "est_width"               "fee"
 [79] "fire_object:type"        "fire_operator"
 [81] "fire_rank"               "foot"
 [83] "footway"                 "ford"
 [85] "forestry"                "forward"
 [87] "frequency"               "fuel"
 [89] "gauge"                   "golf_cart"
 [91] "goods"                   "hazmat"
 [93] "healthcare"              "healthcare:counselling"
 [95] "healthcare:speciality"   "height"
 [97] "hgv"                     "highway"
 [99] "historic"                "horse"
[101] "ice_road"                "incline"
[103] "industrial"              "inline_skates"
[105] "inscription"             "internet_access"
[107] "junction"                "kerb"
[109] "landuse"                 "lanes"
[111] "lanes:bus"               "lanes:psv"
[113] "layer"                   "leaf_cycle"
[115] "leaf_type"               "leisure"
[117] "lhv"                     "lit"
[119] "location"                "man_made"
[121] "maxaxleload"             "maxheight"
[123] "maxlength"               "maxspeed"
[125] "maxstay"                 "maxweight"
[127] "maxwidth"                "military"
[129] "minspeed"                "mofa"
[131] "moped"                   "motor_vehicle"
[133] "motorboat"               "motorcar"
[135] "motorcycle"              "motorroad"
[137] "mountain_pass"           "mtb_scale"
[139] "mtb:description"         "mtb:scale:imba"
[141] "name"                    "narrow"
[143] "natural"                 "noexit"
[145] "non_existent_levels"     "note"
[147] "nudism"                  "office"
[149] "official_name"           "old_name"
[151] "oneway"                  "opening_hours"
[153] "operator"                "organic"
[155] "oven"                    "overtaking"
[157] "parking:condition"       "parking:lane"
[159] "passing_places"          "place"
[161] "power"                   "priority_road"
[163] "produce"                 "proposed"
[165] "protected_area"          "psv"
[167] "public_transport"        "railway"
[169] "railway:preserved"       "railway:track_ref"
[171] "recycling_type"          "ref"
[173] "religion"                "residential"
[175] "resource"                "roadtrain"
[177] "route"                   "sac_scale"
[179] "service"                 "service_times"
[181] "shelter_type"            "shop"
[183] "sidewalk"                "site"
[185] "ski"                     "smoothness"
[187] "social_facility"         "speed_pedelec"
[189] "start_date"              "step_count"
[191] "substation"              "surface"
[193] "tactile_paving"          "tank"
[195] "tidal"                   "toilets:wheelchair"
[197] "toll"                    "tourism"
[199] "tracks"                  "tracktype"
[201] "traffic_calming"         "traffic_sign"
[203] "trail_visibility"        "tunnel"
[205] "turn"                    "type"
[207] "usage"                   "vehicle"
[209] "vending"                 "voltage"
[211] "water"                   "wheelchair"
[213] "wholesale"               "width"
[215] "winter_road"             "wood"
#+end_example

**** Obtener coordenadas

Para obtener las coordenadas de la ciudad que elijamos se puede usar la función =getbb()=. Por ejemplo, con Murcia:

#+begin_src R :session murciaosm :results output
getbb("Murcia Spain")
#+end_src

#+RESULTS:
:         min        max
: x -1.384834 -0.8508899
: y 37.715872 38.1179185

Esto nos da las [[https://en.wikipedia.org/wiki/Geographic_coordinate_system][coordenadas]] de la ciudad. el valor de la =x= da la longitud mientras que la =y= da la altitud.
**** Exportar datos de OSM
Ahora queremos exportar las carreteras del sistema de coordenadas. Para ello primero pasamos la salida de la función =getbb()= a la función [[https://rdrr.io/cran/osmdata/man/opq.html][opq()]].

A continuación pasamos esta salida a la función [[https://github.com/ropensci/osmdata][add_osm_feature()]]. La función tiene dos argumentos. Con la clave especificamos la clave de la característica; con el valor especificamos la etiqueta de la característica.

En este caso primero extraemos las principales calles de la ciudad y las pasamos a la función [[https://www.rdocumentation.org/packages/osmdata/versions/0.1.1/topics/osmdata_sf][osmdata_sf]] para insertarlo luego en =ggplot2=.

#+begin_src R :session murciaosm :results output
  calles_principales <- getbb("Murcia Spain")%>%
    opq()%>%
    add_osm_feature(key = "highway",
			value = c("motorway", "primary",
				  "secondary", "tertiary")) %>%
    osmdata_sf()
  calles_principales
#+end_src

#+RESULTS:
: Object of class 'osmdata' with:
:                  $bbox : 37.7158719,-1.3848335,38.1179185,-0.8508899
:         $overpass_call : The call submitted to the overpass API
:                  $meta : metadata including timestamp and version numbers
:            $osm_points : 'sf' Simple Features Collection with 39961 points
:             $osm_lines : 'sf' Simple Features Collection with 4748 linestrings
:          $osm_polygons : 'sf' Simple Features Collection with 258 polygons
:        $osm_multilines : NULL
:     $osm_multipolygons : NULL

Los datos los hemos almacenado en las variables =calles_principales=, =peatonales= y =agua= Estos objeto tienen diferentes objetos hijxs. Los que nos interesan para este mapa son =osm_lines= ya que son líneas.
**** Completamos objetos

Ahora añadimos también las calles más pequeñas, peatonales y cursos de agua:

#+begin_src R :session murciaosm :results output
  calles_principales <- getbb("Murcia Spain")%>%
    opq()%>%
    add_osm_feature(key = "highway",
			  value = c("motorway", "primary", "secondary", "tertiary")) %>%
    osmdata_sf()

  agua <- getbb("Murcia Spain")%>%
    opq()%>%
    add_osm_feature(key = "waterway",
		      value = c("river", "canal", "drain", "stream", "dock", "lock_gate", "boatyard", "riverbank", "stream")) %>%
    osmdata_sf()

  bici <- getbb("Murcia Spain")%>%
    opq()%>%
    add_osm_feature(key = "highway",
			  value = "cycleway") %>%
    osmdata_sf()
  calles_secundarias <- getbb("Murcia Spain")%>%
    opq()%>%
    add_osm_feature(key = "highway",
		      value = c("residential", "living_street", "unclassified", "service")) %>%
    osmdata_sf()

#+end_src

#+RESULTS:

Daba error y he quitado las peatonales:
#+begin_src

  peatonales <- getbb("Murcia Spain")%>%
    opq()%>%
    add_osm_feature(key = "highway",
			  value = c("footway", "cycleway")) %>%
    osmdata_sf()

#+end_src

Y las he convertido en bici.

**** Primer mapa

Creamos nuestro primer mapa con la función =ggplot()= y nuestro conjunto de datos =agua=:

#+begin_src R :session murciaosm :results output graphics file :file murcia-agua-uno.pdf
ggplot() +
  geom_sf(data = agua$osm_lines,
          inherit.aes = FALSE,
          color = "blue",
          size = 1.4,
          alpha = .8) +
  coord_sf(xlim = c(-1.15, -1.10),
           ylim = c(37.96, 38.01),
           expand = FALSE)

#+end_src

#+RESULTS:
[[file:murcia-agua-uno.pdf]]

#+begin_src R :session streetmaps :results output graphics file :file murcia-uno.pdf
ggplot() +
  geom_sf(data = calles_secundarias$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .4,
          alpha = .8) +
  coord_sf(xlim = c(-1.15, -1.10),
           ylim = c(37.96, 38.01),
           expand = FALSE)

#+end_src

#+RESULTS:
[[file:murcia-uno.pdf]]

Primero se añade la geometría [[https://ggplot2.tidyverse.org/reference/ggsf.html][geom_sf]] a la función =ggplot=. Para los datos añadimos las calles almacenadas en la variable =streets$osm_lines=. Se puede determinar la anchura de las calles con =size=. Para que las calles no estén completamente negras se ha creado una pequeña transparencia con el nivel =alpha=.

Con la función =coord_sf= se puede determinar el eje de las X y de las Y exactamente. Es mejor jugar con los valores hasta que has definido los límites.

Con =expand = FALSE= nos aseguramos de que se muestren las coordenadas correctamente.

**** Segundo mapa

Ahora se añaden calles pequeñas y ríos:

#+begin_src R :session murciaosm :results output graphics file :file murcia-dos.pdf
  ggplot() +
    geom_sf(data = calles_principales$osm_lines,
	    inherit.aes = FALSE,
	    color = "yellow",
	    size = .4,
	    alpha = .8) +
    geom_sf(data = calles_secundarias$osm_lines,
	    inherit.aes = FALSE,
	    color = "grey",
	    size = .4,
	    alpha = .6) +
    geom_sf(data = bici$osm_lines,
	    inherit.aes = FALSE,
	    color = "green",
	    size = .5,
	    alpha = .6) +
    geom_sf(data = agua$osm_lines,
	    inherit.aes = FALSE,
	    color = "blue",
	    size = .2,
	    alpha = .5) +
    coord_sf(xlim = c(-1.15, -1.10),
	     ylim = c(37.96, 38.01),
	     expand = FALSE)

#+end_src

#+RESULTS:
[[file:murcia-dos.pdf]]
**** Colores (para borrar)

En vez de negro se pueden resaltar las calles en color:

#+begin_src R :session murcistreetmaps :results output graphics file :file boceto-tres.pdf
ggplot() +
  geom_sf(data = streets$osm_lines,
          inherit.aes = FALSE,
          color = "steelblue",
          size = .4,
          alpha = .8) +
  geom_sf(data = small_streets$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .4,
          alpha = .6) +
  geom_sf(data = river$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .2,
          alpha = .5) +
  coord_sf(xlim = c(-1.38, -0.85),
           ylim = c(37.71, 38.11),
           expand = FALSE)

#+end_src

#+RESULTS:
[[file:boceto-tres.pdf]]

**** Sin rejilla

Con =theme_void()= borramos la rejilla de los ejes de coordenadas:

#+begin_src R :session murciaosm :results output graphics file :file murcia-sin-rejilla.pdf
  ggplot() +
    geom_sf(data = calles_principales$osm_lines,
	    inherit.aes = FALSE,
	    color = "yellow",
	    size = .4,
	    alpha = .8) +
    geom_sf(data = calles_secundarias$osm_lines,
	    inherit.aes = FALSE,
	    color = "grey",
	    size = .4,
	    alpha = .6) +
    geom_sf(data = bici$osm_lines,
	    inherit.aes = FALSE,
	    color = "green",
	    size = .5,
	    alpha = .6) +
    geom_sf(data = agua$osm_lines,
	    inherit.aes = FALSE,
	    color = "blue",
	    size = .2,
	    alpha = .5) +
    coord_sf(xlim = c(-1.15, -1.10),
	     ylim = c(37.96, 38.01),
	     expand = FALSE) +
    theme_void()

#+end_src

#+RESULTS:
[[file:murcia-sin-rejilla.pdf]]

#+begin_src R :session streetmaps :results output graphics file :file boceto-cuatro.pdf
ggplot() +
  geom_sf(data = streets$osm_lines,
          inherit.aes = FALSE,
          color = "steelblue",
          size = .4,
          alpha = .8) +
  geom_sf(data = small_streets$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .4,
          alpha = .6) +
  geom_sf(data = river$osm_lines,
          inherit.aes = FALSE,
          color = "black",
          size = .2,
          alpha = .5) +
  coord_sf(xlim = c(-1.38, -0.85),
           ylim = c(37.71, 38.11),
           expand = FALSE) +
  theme_void()
#+end_src

#+RESULTS:
[[file:boceto-cuatro.pdf]]
**** Colores invertidos

También se pueden ajustar los colores haciendo un =invert=, es decir, un fondo negro sobre:
#+begin_src R :session murciaosm :results output graphics file :file murcia-invertido.pdf
    ggplot() +
      geom_sf(data = calles_principales$osm_lines,
	      inherit.aes = FALSE,
	      color = "#ffbe7f",
	      size = .4,
	      alpha = .8) +
      geom_sf(data = calles_secundarias$osm_lines,
	      inherit.aes = FALSE,
	      color = "grey",
	      size = .4,
	      alpha = .6) +
      geom_sf(data = bici$osm_lines,
	      inherit.aes = FALSE,
	      color = "green",
	      size = .5,
	      alpha = .2) +
      geom_sf(data = agua$osm_lines,
	      inherit.aes = FALSE,
	      color = "#7fc0ff",
	      size = .4,
	      alpha = .5) +
      coord_sf(xlim = c(-1.15, -1.10),
	       ylim = c(37.96, 38.01),
	       expand = FALSE) +
      theme_void() +
    theme(
      plot.background = element_rect(fill = "black")
      )


#+end_src

#+RESULTS:
[[file:murcia-invertido.pdf]]
***** Grande
#+begin_src R :session murciaosm :results output graphics file :file murcia-medio-invertido.pdf
    ggplot() +
      geom_sf(data = calles_principales$osm_lines,
	      inherit.aes = FALSE,
	      color = "#ffbe7f",
	      size = .4,
	      alpha = .8) +
      geom_sf(data = calles_secundarias$osm_lines,
	      inherit.aes = FALSE,
	      color = "grey",
	      size = .4,
	      alpha = .6) +
      geom_sf(data = bici$osm_lines,
	      inherit.aes = FALSE,
	      color = "green",
	      size = .5,
	      alpha = .2) +
      geom_sf(data = agua$osm_lines,
	      inherit.aes = FALSE,
	      color = "#7fc0ff",
	      size = .4,
	      alpha = .5) +
      coord_sf(xlim = c(-1.21, -0.96),
	       ylim = c(37.90, 38.03),
	       expand = FALSE) +
      theme_void() +
    theme(
      plot.background = element_rect(fill = "black")
      )

#+end_src

#+RESULTS:
[[file:murcia-medio-invertido.pdf]]
***** Mayor
#+begin_src R :session murciaosm :results output graphics file :file murcia-mayor-invertido.pdf
    ggplot() +
      geom_sf(data = calles_principales$osm_lines,
	      inherit.aes = FALSE,
	      color = "#ffbe7f",
	      size = .4,
	      alpha = .8) +
      geom_sf(data = calles_secundarias$osm_lines,
	      inherit.aes = FALSE,
	      color = "grey",
	      size = .4,
	      alpha = .6) +
      geom_sf(data = bici$osm_lines,
	      inherit.aes = FALSE,
	      color = "green",
	      size = .5,
	      alpha = .2) +
      geom_sf(data = agua$osm_lines,
	      inherit.aes = FALSE,
	      color = "#7fc0ff",
	      size = .4,
	      alpha = .5) +
      coord_sf(xlim = c(-1.30, -1.10),
	       ylim = c(37.81, 38.08),
	       expand = FALSE) +
      theme_void() +
    theme(
      plot.background = element_rect(fill = "black")
      )

#+end_src

#+RESULTS:
[[file:murcia-mayor-invertido.pdf]]

***** Más grande
#+begin_src R :session murciaosm :results output graphics file :file murcia-biggest-invertido.pdf
    ggplot() +
      geom_sf(data = calles_principales$osm_lines,
	      inherit.aes = FALSE,
	      color = "#ffbe7f",
	      size = .4,
	      alpha = .8) +
      geom_sf(data = calles_secundarias$osm_lines,
	      inherit.aes = FALSE,
	      color = "grey",
	      size = .4,
	      alpha = .6) +
      geom_sf(data = bici$osm_lines,
	      inherit.aes = FALSE,
	      color = "green",
	      size = .5,
	      alpha = .2) +
      geom_sf(data = agua$osm_lines,
	      inherit.aes = FALSE,
	      color = "#7fc0ff",
	      size = .4,
	      alpha = .5) +
      coord_sf(xlim = c(-1.38, -0.85),
	       ylim = c(37.71, 38.11),
	       expand = FALSE) +
      theme_void() +
    theme(
      plot.background = element_rect(fill = "black")
      )

#+end_src

#+RESULTS:
[[file:murcia-biggest-invertido.pdf]]

: x -1.384834 -0.8508899
: y 37.715872 38.1179185

***** Antiguo
#+begin_src R :session streetmaps :results output graphics file :file boceto-cinco.pdf
  ggplot() +
    geom_sf(data = streets$osm_lines,
	    inherit.aes = FALSE,
	    color = "#ffbe7f",
	    size = .4,
	    alpha = .8) +
    geom_sf(data = small_streets$osm_lines,
	    inherit.aes = FALSE,
	    color = "grey",
	    size = .4,
	    alpha = .6) +
    geom_sf(data = river$osm_lines,
	    inherit.aes = FALSE,
	    color = "#7fc0ff",
	    size = 1.2,
	    alpha = .5) +
    coord_sf(xlim = c(-1.15, -1.10),
	     ylim = c(37.96, 38.00),
	     expand = FALSE)  +
    theme_void() +
  theme(
    plot.background = element_rect(fill = "#f9f9f9")
  )

#+end_src

#+RESULTS:
[[file:boceto-cinco.pdf]]

- Esquina superior izqda.: 38.0001/-1.1576
- Esquina superior dcha.: 38.0026/-1.1059
- Esquina inferior izqda.: 37.9657/-1.1541
- Esquina inferior dcha.: 37.9675/-1.1058
** Postproducción

Una vez que están hechas las visualizaciones toca editarlo. En el caso del manual en el que se basa este manual la imagen no se guardaba sino que lo generaba R al vuelo.

En nuestro caso he ido generando los distintos bocetos con Emacs + Babel + R y guardándolos en una carpeta en PDF.

Para generar la imagen finalmente usaba la función =ggmap()=:

#+begin_src R
ggsave("map.png", width = 6, height = 6)
#+end_src

Nosotrxs, en vez de crear un documento en Powerpoint y redimensionar el mapa hemos abierto un documento en Inkscape donde se edita gráficamente con el estilo de los otros.

* Datos de OpenStreetMap con R
  :PROPERTIES:
  :EXPORT_TITLE: Acceso a datos de OpenStreetMap con R
  :EXPORT_DATE: <2021-01-09 Sat>
  :EXPORT_FILE_NAME: index
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER: :subtitle "Cómo acceder a datos de OpenStreetMap"
  :EXPORT_HUGO_SECTION*: acceso-datos-openstreetmap-r
  :EXPORT_AUTHOR: adolfo_anton
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :math true
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :diagram true
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :publication_types ["0"]
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :publication "MPVD"
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :featured true
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :projects '("periodismodatos")
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :abstract_short "Cómo acceder a datos de OpenStreetMap con R"
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :authors '("adolfo_anton")
  :EXPORT_HUGO_CUSTOM_FRONT_MATTER+: :abstract "Cómo acceder a datos de OpenStreetMap con R"
  :END:

#+begin_src toml :front_matter_extra t

  [image]
   caption = "[Orion aveugle cherchant le soleil](https://commons.wikimedia.org/wiki/File:Orion_aveugle_cherchant_le_soleil.jpg), Nicolas Poussin  / Public domain"
   focal_point = "BottomLeft"
   placement = 3
   preview_only = false
#+end_src

Cuando Christian Burkhart creó su mapa del callejero con R se fijó mucho en el trabajo de Dominique Royé con las librerías que trabajan con datos de OpenStreetMap como el de las [[https://dominicroye.github.io/en/2018/accessing-openstreetmap-data-with-r/][estaciones de servicio de Europa]]. Para ello utilizó =POI= (/Point of Interest/, puntos de interés en la terminología de las representaciones espaciales de datos, puntos geográficos al fin y al cabo).

Para obtener los datos se utiliza una pasarela de la API.

** Instalación de paquetes

Lo primer paso es instalar las librerías necesarias como las librerías =tidyverse=, que es una colección de distintas librerías entre las que se incluye =dplyr= para la manipulación de datos o =ggplot2= para la visualización; =sf= que es el nuevo estándar para trabajar con datos espaciales y es compatible con =ggplot= y =dplyr=. Finalmente =ggmap= nos ayuda a crear mapas.

#+begin_src R :session osm :results output
  if(!require("osmdata")) install.packages("osmdata")
  if(!require("tidyverse")) install.packages("tidyverse")
  if(!require("ggmap")) install.packages("ggmap")
  if(!require("sf")) install.packages("sf")
#+end_src

#+RESULTS:
: Loading required package: sf
: Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1

** Se activan las librerías

Y se activan las librerías:

#+begin_src R :session osm :results output
library(tidyverse)
library(osmdata)
library(sf)
library(ggmap)
#+end_src

#+RESULTS:
** Hacer una consulta

Antes de hacer una consulta hay que saber lo que se puede filtrar. Para ello se puede probar con la función =available_features()= que devuelve las características espaciales o =features= de OSM, algo común en el idioma de la representación espacial geográfica.

Para no cargar mucho la consulta incluimos la función en la función =head()=:

#+begin_src R :session osm :results output
head(available_features())
#+end_src

#+RESULTS:
: [1] "4wd_only"  "abandoned" "abutters"  "access"    "addr"      "addr:city"

Estas son las cinco primeras =features= que devueve la consulta.

Se pueden ver las etiquetas de cualquiera de ellas a través de la función =available_tags()=:

#+begin_src R :session osm :results output
head(available_tags("tourism"))
#+end_src

#+RESULTS:
: [1] "alpine_hut" "apartment"  "aquarium"   "artwork"    "attraction"
: [6] "camp_pitch"

** La primera consulta: dónde están las peluquerías en Murcia

Para construir una consulta se usa el operador =%>%= que ayuda a encadenar varias funciones sin asignar el resultado a un nuevo objeto. Su uso está muy extendido entre la colección de librerías de =tidyverse=. Este [[https://www.datacamp.com/community/tutorials/pipe-r-tutorial][tutorial de datacamp]] lo explica más.

En la primera parte de la consulta tenemos que indicar el lugar para el que queremos extraer información. Eso se consigue con la función =getbb()= que crea unos límites para un lugar según un nombre dado.

La función principal es =opq()= que crea la consulta final.

Se filtra la información que se busca con la función =add_osm_feature()=.

En esta primera consulta se buscan los cines de Madrid, con =amenity= como =feature= y =cinema= como =tag=. Se pueden obtener los resultados espaciales en varios formatos. la función =osmdata_*()= envía la consulta al servidor y según si el sufijo es =sf=, =sp= o =xml= devuelve una característica simple (de =single feature=), espacial (de =spatial=) o en formato =XML=.

#+begin_src R :session osm :results output
consulta <- getbb("Murcia Spain") %>%
  opq() %>%
   add_osm_feature("shop", "hairdresser")
str(consulta)
#+end_src

#+RESULTS:
: List of 4
:  $ bbox    : chr "37.7158719,-1.3848335,38.1179185,-0.8508899"
:  $ prefix  : chr "[out:xml][timeout:25];\n(\n"
:  $ suffix  : chr ");\n(._;>;);\nout body;"
:  $ features: chr " [\"shop\"=\"hairdresser\"]"
:  - attr(*, "class")= chr [1:2] "list" "overpass_query"
:  - attr(*, "nodes_only")= logi FALSE

#+begin_src R :session osm :results output
pelus <- osmdata_sf(consulta)
pelus
#+end_src

#+RESULTS:
: Object of class 'osmdata' with:
:                  $bbox : 37.7158719,-1.3848335,38.1179185,-0.8508899
:         $overpass_call : The call submitted to the overpass API
:                  $meta : metadata including timestamp and version numbers
:            $osm_points : 'sf' Simple Features Collection with 74 points
:             $osm_lines : NULL
:          $osm_polygons : 'sf' Simple Features Collection with 1 polygons
:        $osm_multilines : NULL
:     $osm_multipolygons : NULL

El resultado es una lista de diferentes objetos espaciales. En nuestro caso solo nos interesan los puntos =osm_points=.

** Visualizar

Lo bueno de los objetos =sf= es que para ggplot ya existe una función geométrica llamada =geom_sf()=. De fondo se puede añadir un mapa con la librería =ggmap= y la función =get_map()= que descarga un mapa para un lugar dado. También podría ser una dirección, latitud/longitud o =bounding box=. El argumento del tipo de mapa permite indicar el estilo o tipo de mapa. Más información sobre esto se puede encontrar en la ayuda =?get_map=

Cuando se hace un gráfico con =ggplot= se suele empezar con la función =ggplot()=. En este caso se empieza con =ggmap()= que incluye el objeto con el mapa de fondo. Entonces se añaden los puntos de las peluquerías de Murcia con =geom_sf()=. Es importante indicar con el argumento =inherit.aes = FALSE= que use los =aesthetic mappings= del objeto espacial =osm_points=. Además, se cambia el color, relleno, transparencia tipo y tamaño de los círculos.

#+begin_src R :session osm :results output graphics file :file murcia_se_peina.pdf
    murcia_se_peina <- get_map(c(left = -1.18, bottom = 37.91, right = -1.05, top = 38.04), maptype = "watercolor")

    ggmap(murcia_se_peina) +
    geom_sf(data = pelus$osm_points, inherit.aes = FALSE, colour = "#238443", fill = "#004529", alpha = .5, size = 4, shape = 21) +
    labs(x = "", y = "")
#+end_src

#+RESULTS:
[[file:murcia_se_peina.pdf]]

** Dónde hay una zapatería

#+begin_src R :session osm :results output
q_murcia_shoes <- getbb("Murcia Spain") %>%
  opq() %>%
   add_osm_feature("shop", "shoes")
str(q_murcia_shoes)
#+end_src

#+RESULTS:
: List of 4
:  $ bbox    : chr "37.7158719,-1.3848335,38.1179185,-0.8508899"
:  $ prefix  : chr "[out:xml][timeout:25];\n(\n"
:  $ suffix  : chr ");\n(._;>;);\nout body;"
:  $ features: chr " [\"shop\"=\"shoes\"]"
:  - attr(*, "class")= chr [1:2] "list" "overpass_query"
:  - attr(*, "nodes_only")= logi FALSE

#+begin_src R :session osm :results output
murcia_shoes <- osmdata_sf(q_murcia_shoes)
murcia_shoes
#+end_src

#+RESULTS:
: Object of class 'osmdata' with:
:                  $bbox : 37.7158719,-1.3848335,38.1179185,-0.8508899
:         $overpass_call : The call submitted to the overpass API
:                  $meta : metadata including timestamp and version numbers
:            $osm_points : 'sf' Simple Features Collection with 16 points
:             $osm_lines : NULL
:          $osm_polygons : 'sf' Simple Features Collection with 0 polygons
:        $osm_multilines : NULL
:     $osm_multipolygons : NULL

   #+begin_src R :session osm :results output graphics file :file murcia_shoes.pdf
     murcia_shoes <- get_map(getbb("Murcia Spain"), maptype = "watercolor", source = "stamen")

     ggmap(murcia_shoes) +
     geom_sf(data = q_murcia_shoes$osm_points, inherit.aes = FALSE, colour = "#238443", fill = "#004529", alpha = .5, size = 4, shape = 21) +
     labs(x = "", y = "")


#+end_src

#+RESULTS:
[[file:murcia_shoes.pdf]]

** Mercadona

En vez de obtener un =bounding box= con la función =getbb()= se puede crear sabiendo los puntos oeste, sur, este y norte. En la consulta hay dos características: el nombre y la tienda para filtrar supermercados de esta marca en particular. Según el area o volumen de la consulta se puede alargar el tiempo de espera. Por defecto el límite se establece en 25 segundos antes del =timeout=.

Lo que nos interesa son los puntos donde hay supermercados por lo que se usa la geometría a través de =geom_sf()=. La función =theme_void()= borra todo menos los puntos.

Se crean los límites:
#+begin_src R :session osm :results output
m <- c(-10, 30, 5, 46)
#+end_src

#+RESULTS:

Se hace la consulta:
#+begin_src R :session osm :results output
    q <- m %>%
    opq (timeout = 25*100) %>%
    add_osm_feature("name", "Mercadona") %>%
    add_osm_feature("shop", "supermarket")
  mercadona <- osmdata_sf(q)
mercadona
#+end_src

#+RESULTS:
: Object of class 'osmdata' with:
:                  $bbox : 30,-10,46,5
:         $overpass_call : The call submitted to the overpass API
:                  $meta : metadata including timestamp and version numbers
:            $osm_points : 'sf' Simple Features Collection with 6478 points
:             $osm_lines : 'sf' Simple Features Collection with 25 linestrings
:          $osm_polygons : 'sf' Simple Features Collection with 631 polygons
:        $osm_multilines : NULL
:     $osm_multipolygons : 'sf' Simple Features Collection with 11 multipolygons

Y el mapa final:

#+begin_src R :session osm :results output graphics file :file mercadonas.pdf
  ggplot(mercadona$osm_points)+
  geom_sf(colour = "#08519c",
	fill = "#08306b",
      alpha = 5,
    size = 1,
  shape = 21) +
  theme_void()
#+end_src

#+RESULTS:
[[file:mercadonas.pdf]]

* Visualizar el crecimiento urbano
La Dirección General del Catastro de España dispone de información espacial de toda la edificación a excepción del País Vasco y Navarra. Este conjunto de datos forma parte de la implantación de INSPIRE, la Infraestructura de Información Espacial en Europa. Más información podemos encontrar aquí. Utilizaremos los enlaces (urls) en formato ATOM, que es un formato de redifusión de tipo RSS, permitiéndonos obtener el enlace de descarga para cada municipio.

Esta entrada de blog es una versión reducida del caso práctico que podéis encontrar en nuestra reciente publicación - Introducción a los SIG con R - publicado por Dominic Royé y Roberto Serrano-Notivoli.

Paquetes
Paquete	Descripción
tidyverse	Conjunto de paquetes (visualización y manipulación de datos): ggplot2, dplyr, purrr,etc.
sf	Simple Feature: importar, exportar y manipular datos vectoriales
fs	Proporciona una interfaz uniforme y multiplataforma para las operaciones del sistema de archivos
lubridate	Fácil manipulación de fechas y tiempos
feedeR	Importar formatos de redifusión RSS
tmap	Fácil creación de mapas temáticos
classInt	Para crear intervalos de clase univariantes
sysfonts	Carga familias tipográficas del sistema y de Google
showtext	Usar familias tipográficas más fácilmente en gráficos R
# instalamos los paquetes necesarios
if(!require("tidyverse")) install.packages("tidyverse")
if(!require("feedeR")) install.packages("feedeR")
if(!require("fs")) install.packages("fs")
if(!require("lubridate")) install.packages("lubridate")
if(!require("fs")) install.packages("fs")
if(!require("tmap")) install.packages("tmap")
if(!require("classInt")) install.packages("classInt")
if(!require("showtext")) install.packages("showtext")
if(!require("sysfonts")) install.packages("sysfonts")
if(!require("rvest")) install.packages("rvest")

# cargamos los paquetes
library(feedeR)
library(sf)
library(fs)
library(tidyverse)
library(lubridate)
library(classInt)
library(tmap)
library(rvest)
Enlaces de descarga
La primera url nos dará acceso a un listado de provincias, sedes territoriales (no siempre coinciden con la provincia), con nuevos enlaces RSS los cuales incluyen los enlaces finales de descarga para cada municipio. En este caso, descargaremos el edificado de Valencia. Los datos del Catastro se actualizan cada seis meses.

url <- "http://www.catastro.minhap.es/INSPIRE/buildings/ES.SDGC.bu.atom.xml"

# importamos los RSS con enlaces de provincias
prov_enlaces <- feed.extract(url)
str(prov_enlaces) #estructura es lista
## List of 4
##  $ title  : chr "Download service of Buildings. Territorial Office"
##  $ link   : chr "http://www.catastro.minhap.es/INSPIRE/buildings/ES.SDGC.BU.atom.xml"
##  $ updated: POSIXct[1:1], format: "2019-10-26"
##  $ items  :'data.frame': 52 obs. of  4 variables:
##   ..$ title: chr [1:52] "Territorial office 02 Albacete" "Territorial office 03 Alicante" "Territorial office 04 AlmerÃa" "Territorial office 05 Avila" ...
##   ..$ date : POSIXct[1:52], format: "2019-10-26" "2019-10-26" ...
##   ..$ link : chr [1:52] "http://www.catastro.minhap.es/INSPIRE/buildings/02/ES.SDGC.bu.atom_02.xml" "http://www.catastro.minhap.es/INSPIRE/buildings/03/ES.SDGC.bu.atom_03.xml" "http://www.catastro.minhap.es/INSPIRE/buildings/04/ES.SDGC.bu.atom_04.xml" "http://www.catastro.minhap.es/INSPIRE/buildings/05/ES.SDGC.bu.atom_05.xml" ...
##   ..$ hash : chr [1:52] "d21ebb7975e59937" "bdba5e149f09e9d8" "03bcbcc7c5be2e17" "8a154202dd778143" ...
# extraemos la tabla con los enlaces
prov_enlaces_tab <- as_tibble(prov_enlaces$items) %>%
                       mutate(title = repair_encoding(title))
## Best guess: UTF-8 (100% confident)
prov_enlaces_tab
## # A tibble: 52 x 4
##    title           date                link                            hash
##    <chr>           <dttm>              <chr>                           <chr>
##  1 Territorial of~ 2019-10-26 00:00:00 http://www.catastro.minhap.es/~ d21ebb79~
##  2 Territorial of~ 2019-10-26 00:00:00 http://www.catastro.minhap.es/~ bdba5e14~
##  3 Territorial of~ 2019-10-26 00:00:00 http://www.catastro.minhap.es/~ 03bcbcc7~
##  4 Territorial of~ 2019-10-26 00:00:00 http://www.catastro.minhap.es/~ 8a154202~
##  5 Territorial of~ 2019-10-26 00:00:00 http://www.catastro.minhap.es/~ 7d3fd376~
##  6 Territorial of~ 2019-10-26 00:00:00 http://www.catastro.minhap.es/~ 9c08741f~
##  7 Territorial of~ 2019-10-26 00:00:00 http://www.catastro.minhap.es/~ ff722b15~
##  8 Territorial of~ 2019-10-26 00:00:00 http://www.catastro.minhap.es/~ b431aa61~
##  9 Territorial of~ 2019-10-26 00:00:00 http://www.catastro.minhap.es/~ f79c6562~
## 10 Territorial of~ 2019-10-26 00:00:00 http://www.catastro.minhap.es/~ d702a6a8~
## # ... with 42 more rows
Accedemos y descargamos los datos de Valencia. Para encontrar el enlace final de descarga usamos la función filter() del paquete dplyr buscando el nombre de la sede territorial y posteriormente el nombre del municipio en mayúsculas con la función str_detect() de stringr. La función pull() nos permite extraer una columna de un data.frame.

Actualmente la función feed.extract() no importa correctamente en el encoding UTF-8 (Windows). Por eso, en algunas ciudades pueden aparecer una mala codificación de caracteres especiales “CÃ¡diz”. Para subsanar este problema aplicamos la función repair_encoding() del paquete rvest.

# filtramos la provincia y obtenemos la url RSS
val_atom <- filter(prov_enlaces_tab, str_detect(title, "Valencia")) %>% pull(link)

# importamos la RSS
val_enlaces <- feed.extract(val_atom)

# obtenemos la tabla con los enlaces de descarga
val_enlaces_tab <- val_enlaces$items
val_enlaces_tab <- mutate(val_enlaces_tab, title = repair_encoding(title),
                          link = repair_encoding(link))
## Best guess: UTF-8 (100% confident)
## Best guess: UTF-8 (100% confident)
# filtramos la tabla con el nombre de la ciudad
val_link <- filter(val_enlaces_tab, str_detect(title, "VALENCIA")) %>% pull(link)
val_link
## [1] "http://www.catastro.minhap.es/INSPIRE/Buildings/46/46900-VALENCIA/A.ES.SDGC.BU.46900.zip"
Descarga de datos
La descarga se realiza con la función download.file() que únicamente tiene dos argumentos principales, el enlace de descarga y la ruta con el nombre del archivo. En este caso hacemos uso de la función tempfile(), que nos es útil para crear archivos temporales, es decir, archivos que únicamente existen en la memoría RAM por un tiempo determinado. El archivo que descargamos tiene extensión *.zip, por lo que debemos descomprimirlo con otra función (unzip()), que requiere el nombre del archivo y el nombre de la carpeta donde lo queremos descomprimir. Por último, la función URLencode() codifica una dirección URL que contiene caracteres especiales.

# creamos un archivo temporal
temp <- tempfile()

# descargamos los datos
download.file(URLencode(val_link), temp)

# descomprimimos a una carpeta llamda buildings
unzip(temp, exdir = "buildings")
Importar los datos
Para importar los datos utilizamos la función dir_ls() del paquete fs, que nos permite obtener los archivos y carpetas de una ruta concreta al mismo tiempo que filtramos por un patrón de texto (regexp: expresión regular). Aplicamos la función st_read() del paquete sf al archivo espacial de formato Geography Markup Language (GML).

# obtenemos la ruta con el archivo
file_val <- dir_ls("buildings", regexp = "building.gml")

# importamos los datos
buildings_val <- st_read(file_val)
## Reading layer `Building' from data source `C:\Users\xeo19\Documents\GitHub\blogR_update\content\post\es\2019-11-01-visualizar-crecimiento-urbano\buildings\A.ES.SDGC.BU.46900.building.gml' using driver `GML'
## Simple feature collection with 36296 features and 24 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: 720608 ymin: 4351287 xmax: 734982.5 ymax: 4382906
## CRS:            25830
Preparación de los datos
Únicamente convertimos la columna de la edad del edificio (beginning) en clase Date. La columna de la fecha contiene algunas fechas en formato --01-01 lo que no corresponde a ninguna fecha reconocible. Por eso, reemplazamos el primer - por 0000.

#
buildings_val <- mutate(buildings_val,
               beginning = str_replace(beginning, "^-", "0000") %>%
                            ymd_hms() %>% as_date()
               )
## Warning: 4 failed to parse.
Gráfico de distribución
Antes de crear el mapa de la edad del edificado, lo que reflejará el crecimiento urbano, haremos un gráfico de distribución de la fecha de construcción de los edificios. Podremos identificar claramente períodos de expansión urbana. Usaremos el paquete ggplot2 con la geometría de geom_density() para este objetivo. La función font_add_google() del paquete sysfonts nos permite descargar e incluir familias tipográficas de Google.

#descarga de familia tipográfica
sysfonts::font_add_google("Montserrat", "Montserrat")

#usar showtext para familias tipográficas
showtext::showtext_auto()
#limitamos al periodo posterior a 1750
filter(buildings_val, beginning >= "1750-01-01") %>%
 ggplot(aes(beginning)) +
    geom_density(fill = "#2166ac", alpha = 0.7) +
  scale_x_date(date_breaks = "20 year",
               date_labels = "%Y") +
  theme_minimal() +
  theme(title = element_text(family = "Montserrat"),
        axis.text = element_text(family = "Montserrat")) +
  labs(y = "",x = "", title = "Evolución del desarrollo urbano")


Buffer de 2,5 km de Valencia
Para poder visualizar bien la distribución del crecimiento, limitamos el mapa a un radio de 2,5 km desde el centro de la ciudad. Usamos la función geocode_OSM() del paquete tmaptools para obtener las coordenadas de Valencia en clase sf. Después proyectamos los puntos al sistema que usamos para el edificado (EPSG:25830). Como último paso creamos con la función st_buffer() un buffer con 2500 m y la intersección con nuestros datos de los edificios. También es posible crear un buffer en forma de un rectángulo indicando el tipo de estilo con el argumento endCapStyle = "SQUARE".

# obtenemos las coordinadas de Valencia
ciudad_point <- tmaptools::geocode_OSM("Valencia",
                                      as.sf = TRUE)

# proyectamos los datos
ciudad_point <- st_transform(ciudad_point, 25830)

# creamos un buffer
point_bf <- st_buffer(ciudad_point, 2500)


# obtenemos la intersección entre el buffer y la edificación
buildings_val25 <- st_intersection(buildings_val, point_bf)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
Preparar los datos para el mapas
Para poder visualizar bien las diferentes épocas de crecimiento, categorizamos el año en 15 grupos empleando cuartiles.

#encontrar 15 clases
br <- classIntervals(year(buildings_val25$beginning), 15, "quantile")
## Warning in classIntervals(year(buildings_val25$beginning), 15, "quantile"): var
## has missing values, omitted in finding classes
#crear etiquetas
lab <- names(print(br, under = "<", over = ">", cutlabels = FALSE))
## style: quantile
##      < 1890 1890 - 1912 1912 - 1925 1925 - 1930 1930 - 1940 1940 - 1950
##         940        1369         971         596        1719        1080
## 1950 - 1957 1957 - 1962 1962 - 1966 1966 - 1970 1970 - 1973 1973 - 1977
##        1227        1266        1233        1165        1161         932
## 1977 - 1987 1987 - 1999      > 1999
##        1337        1197        1190
#categorizar el año
buildings_val25 <- mutate(buildings_val25,
               yr_cl = cut(year(beginning), br$brks, labels = lab, include.lowest = TRUE))
Mapa de Valencia
El mapa creamos con el paquete tmap. Es una interesante alternativa a ggplot2. Se trata de un paquete de funciones especializadas en crear mapas temáticos. La filosofía del paquete sigue a la de ggplot2, creando multiples capas con diferentes funciones, que siempre empiezan con tm_* y se combinan con +. La construcción de un mapa con tmap siempre comienza con tm_shape(), donde se definen los datos que queremos dibujar. Luego agregamos la geometría correspondiente al tipo de datos (tm_polygon(), tm_border(), tm_dots() o incluso tm_raster()). La función tm_layout() ayuda a configurar el estilo del mapa.

Cuando necesitamos más colores del máximo permitido por RColorBrewer podemos pasar los colores a la función colorRampPalette(). Esta función interpola para un mayor número más colores de la gama.

#colores
col_spec <- RColorBrewer::brewer.pal(11, "Spectral")

#función de una gama de colores
col_spec_fun <- colorRampPalette(col_spec)


#crear los mapas
tm_shape(buildings_val25) +
  tm_polygons("yr_cl",
              border.col = "transparent",
              palette = col_spec_fun(15),
              textNA = "Sin dato",
              title = "") +
 tm_layout(bg.color = "black",
           outer.bg.color = "black",
           legend.outside = TRUE,
           legend.text.color = "white",
           legend.text.fontfamily = "Montserrat",
            panel.label.fontfamily = "Montserrat",
            panel.label.color = "white",
            panel.label.bg.color = "black",
            panel.label.size = 5,
            panel.label.fontface = "bold")


Mapa dinámico en leaflet
Una ventaja muy interesante es la función tmap_leaflet() del paquete tmap para pasar de forma sencilla un mapa creado en el mismo marco a leaflet.

#mapa tmap de Santiago
m <-   tm_shape(buildings_val25) +
          tm_polygons("yr_cl",
              border.col = "transparent",
              palette = col_spec_fun(15),
              textNA = "Without data",
              title = "")


#mapa dinámico
tmap_leaflet(m)

* Visualizar las anomalías climáticas
When we visualize precipitation and temperature anomalies, we simply use time series as bar graph indicating negative and positive values in red and blue. However, in order to have a better overview we need both anomalies in a single graph. In this way we could more easly answer the question of whether a particular season or month was dry-warm or wet-cold, and even compare these anomalies in the context of previous years.

Packages
In this post we will use the following packages:

Package	Description
tidyverse	Collection of packages (visualization, manipulation): ggplot2, dplyr, purrr, etc.
lubridate	Easy manipulation of dates and times
ggrepel	Repel overlapping text labels in ggplot2
#we install the packages if necessary
if(!require("tidyverse")) install.packages("tidyverse")
if(!require("ggrepel")) install.packages("ggrepel")
if(!require("lubridate")) install.packages("lubridate")

#packages
library(tidyverse)
library(lubridate)
library(ggrepel)
Preparing the data
First we import the daily precipitation and temperature data from the selected weather station (download). We will use the data from Tenerife South (Spain) [1981-2020] accessible through Open Data AEMET. In R there is a package called meteoland that facilitates the download with specific functions to access data from AEMET (Spanish State Meteorological Agency), Meteogalicia (Galician Meteorological Service) and Meteocat (Catalan Meteorological Service).

Step 1: import the data
We import the data in csv format, the first column is the date, the second column the precipitation (pr) and the last column the average daily temperature (ta).

data <- read_csv("meteo_tenerife.csv")
## Parsed with column specification:
## cols(
##   date = col_date(format = ""),
##   pr = col_double(),
##   ta = col_double()
## )
data
## # A tibble: 14,303 x 3
##    date          pr    ta
##    <date>     <dbl> <dbl>
##  1 1981-01-02   0    17.6
##  2 1981-01-03   0    16.8
##  3 1981-01-04   0    17.4
##  4 1981-01-05   0    17.6
##  5 1981-01-06   0    17
##  6 1981-01-07   0    17.6
##  7 1981-01-08   0    18.6
##  8 1981-01-09   0    19.8
##  9 1981-01-10   0    21.5
## 10 1981-01-11   3.8  17.6
## # ... with 14,293 more rows
Step 2: preparing the data
In the second step we prepare the data to calculate the anomalies. To do this, we create three new columns: the month, the year, and the season of the year. Since our objective is to analyse winter anomalies, we cannot use the calendar year, because winter includes the month of December of one year and the months of January and February of the following. The custom function meteo_yr() extracts the year from a date indicating the starting month. The concept is similar to the hydrological year in which it starts on October 1.

meteo_yr <- function(dates, start_month = NULL) {
  # convert to POSIXlt
  dates.posix <- as.POSIXlt(dates)
  # year offset
  offset <- ifelse(dates.posix$mon >= start_month - 1, 1, 0)
  # new year
  adj.year = dates.posix$year + 1900 + offset
  return(adj.year)
}
We will use many functions of the package collection tidyverse (https://www.tidyverse.org/). The mutate() function helps to add new columns or change existing ones. To define the seasons, we use the case_when() function from the dplyr package, which has many advantages compared to a chain of ifelse(). In case_when() we use two-side formulas, on the one hand the condition and on the other the action when that condition is met. A two-sided formula in R consists of the operator ~. The binary operator %in% allows us to filter several values in a greater set.

data <- mutate(data,
               winter_yr = meteo_yr(date, 12),
               month = month(date),
               season = case_when(month %in% c(12,1:2) ~ "Winter",
                                  month %in% 3:5 ~ "Spring",
                                  month %in% 6:8 ~ "Summer",
                                  month %in% 9:11 ~ "Autum"))

data
## # A tibble: 14,303 x 6
##    date          pr    ta winter_yr month season
##    <date>     <dbl> <dbl>     <dbl> <dbl> <chr>
##  1 1981-01-02   0    17.6      1981     1 Winter
##  2 1981-01-03   0    16.8      1981     1 Winter
##  3 1981-01-04   0    17.4      1981     1 Winter
##  4 1981-01-05   0    17.6      1981     1 Winter
##  5 1981-01-06   0    17        1981     1 Winter
##  6 1981-01-07   0    17.6      1981     1 Winter
##  7 1981-01-08   0    18.6      1981     1 Winter
##  8 1981-01-09   0    19.8      1981     1 Winter
##  9 1981-01-10   0    21.5      1981     1 Winter
## 10 1981-01-11   3.8  17.6      1981     1 Winter
## # ... with 14,293 more rows
Step 3: estimate winter anomalies
In the next step we create a subset of the winter months. Then we group by the defined meteorological year and calculate the sum and average for precipitation and temperature, respectively. To facilitate the work, the magrittr package introduces the operator called pipe in the form %>% with the aim of combining several functions without the need to assign the result to a new object. The pipe operator passes the output of a function applied to the first argument of the next function. This way of combining functions allows you to chain several steps simultaneously. The %>% must be understood and pronounced as then.

data_inv <- filter(data,
                   season == "Winter") %>%
              group_by(winter_yr) %>%
                  summarise(pr = sum(pr, na.rm = TRUE),
                            ta = mean(ta, na.rm = TRUE))
Now we only have to calculate the anomalies of precipitation and temperature. The columns pr_mean and ta_mean will contain the climate average, the reference for the anomalies with respect to the normal period 1981-2010. Therefore, we need to filter the values to the period before 2010, which we will do in the usual way of filtering vectors in R. Once we have the references we estimate the anomalies pr_anom and ta_anom. To facilitate the interpretation, in the case of precipitation we express the anomalies as percentage, with the average set at 0% instead of 100%.

In addition, we add three required columns with information for the creation of the graph: 1) labyr contains the year of each anomaly as long as it has been greater/less than -+10% or -+0.5ºC, respectively (this is for reducing the number of labels), 2) symb_point is a dummy variable in order to be able to create different symbols between the cases of (1), and 3) lab_font for highlighting in bold the year 2020.

data_inv <-  mutate(data_inv, pr_mean = mean(pr[winter_yr <= 2010]),
                              ta_mean = mean(ta[winter_yr <= 2010]),
                              pr_anom = (pr*100/pr_mean)-100,
                              ta_anom = ta-ta_mean,

                              labyr = case_when(pr_anom < -10 & ta_anom < -.5 ~ winter_yr,
                                                pr_anom < -10 & ta_anom > .5 ~ winter_yr,
                                                pr_anom > 10 & ta_anom < -.5 ~ winter_yr,
                                                pr_anom > 10 & ta_anom > .5 ~ winter_yr),
                              symb_point = ifelse(!is.na(labyr), "yes", "no"),
                              lab_font = ifelse(labyr == 2020, "bold", "plain")
                    )
Creating the graph
We will build the chart adding layer by layer the distinctive elements: 1) the background with the different grids (Dry-Warm, Dry-Cold, etc.), 2) the points and labels, and 3) the style adjustments.

Part 1
The idea is that the points with dry-warm anomalies are located in quadrant I (top-right) and those with wet-cold in quadrant III (bottom-left). Therefore, we must invert the sign in the precipitation anomalies. Then we create a data.frame with the label positions of the four quadrants. For the positions in x and y Inf and -Inf are used, which is equivalent to the maximum panel sides with respect to the data. However, it is necessary to adjust the position towards the extreme points within the panel with the known arguments of ggplot2: hjust and vjust.

data_inv_p <- mutate(data_inv, pr_anom = pr_anom * -1)

bglab <- data.frame(x = c(-Inf, Inf, -Inf, Inf),
                    y = c(Inf, Inf, -Inf, -Inf),
                    hjust = c(1, 1, 0, 0),
                    vjust = c(1, 0, 1, 0),
                    lab = c("Wet-Warm", "Dry-Warm",
                              "Wet-Cold", "Dry-Cold"))


bglab
##      x    y hjust vjust      lab
## 1 -Inf  Inf     1     1 Wet-Warm
## 2  Inf  Inf     1     0 Dry-Warm
## 3 -Inf -Inf     0     1 Wet-Cold
## 4  Inf -Inf     0     0 Dry-Cold
Part 2
In the second part we can start building the chart by adding all graphical elements. First we create the background with different colors of each quadrant. The function annotate() allows adding geometry layers without the use of variables within data.frames. With the geom_hline() and geom_vline() function we mark the quadrants horizontally and vertically using a dashed line. Finally, we draw the labels of each quadrant, using the function geom_text(). When we use other data sources than the main one used in ggplot(), we must indicate it with the argument data in the corresponding geometry function.

g1 <- ggplot(data_inv_p,
             aes(pr_anom, ta_anom)) +
         annotate("rect", xmin = -Inf, xmax = 0, ymin = 0, ymax = Inf, fill = "#fc9272", alpha = .6) + #wet-warm
         annotate("rect", xmin = 0, xmax = Inf, ymin = 0, ymax = Inf, fill = "#cb181d", alpha = .6) + #dry-warm
         annotate("rect", xmin = -Inf, xmax = 0, ymin = -Inf, ymax = 0, fill = "#2171b5", alpha = .6) + #wet-cold
         annotate("rect", xmin = 0, xmax = Inf, ymin = -Inf, ymax = 0, fill = "#c6dbef", alpha = .6) + #dry-cold
       geom_hline(yintercept = 0,
                  linetype = "dashed") +
       geom_vline(xintercept = 0,
                  linetype = "dashed") +
       geom_text(data = bglab,
                     aes(x, y, label = lab, hjust = hjust, vjust = vjust),
                     fontface = "italic", size = 5,
                     angle = 90, colour = "white")

g1


Part 3
In the third part we simply add the points of the anomalies and the labels of the years. The geom_text_repel() function is similar to the one known by default in ggplot2, geom_text(), but it repels overlapping text labels away from each other.

g2 <- g1 + geom_point(aes(fill = symb_point, colour = symb_point),
                      size = 2.8, shape = 21, show.legend = FALSE) +
           geom_text_repel(aes(label = labyr, fontface = lab_font),
                           max.iter = 5000,
                           size = 3.5)
g2
## Warning: Removed 25 rows containing missing values (geom_text_repel).


Part 4
In the last part we adjust, in addition to the general style, the axes, the color type and the (sub)title. Remember that we changed the sign on precipitation anomalies. Hence, we must use the arguments breaks and labels in the function scale_x_continouous() to reverse the sign in the labels corresponding to the breaks.

g3 <- g2 + scale_x_continuous("Precipitation anomaly in %",
                              breaks = seq(-100, 250, 10) * -1,
                              labels = seq(-100, 250, 10),
                              limits = c(min(data_inv_p$pr_anom), 100)) +
           scale_y_continuous("Mean temperature anomaly in ºC",
                              breaks = seq(-2, 2, 0.5)) +
           scale_fill_manual(values = c("black", "white")) +
           scale_colour_manual(values = rev(c("black", "white"))) +
           labs(title = "Winter anomalies in Tenerife South",
                caption = "Data: AEMET\nNormal period 1981-2010") +
           theme_bw()

g3
## Warning: Removed 25 rows containing missing values (geom_text_repel).

* Visualizar el crecimiento urbano
Cuenta [[https://dominicroye.github.io/es/2019/visualizar-el-crecimiento-urbano/][Dominic Royé]] que la Dirección General del Catastro de España dispone de [[http://www.catastro.meh.es/webinspire/index.html][información espacial de toda la edificación]] a excepción del País Vasco y Navarra. Este conjunto de datos forma parte de la implantación de [[https://inspire.ec.europa.eu/][INSPIRE]], la /Infraestructura de Información Espacial/ en Europa.

Se pretenden utilizar las URL de cada municipio en formato =ATOM=, un estándar de redifusión del contenido web.

** Instalación y carga de paquetes

Se requieresn los siguientes paquetes de =R=:
- tidyverse :: metapaquete para visualización y manipulación de datos con =ggplot2=, =dplyr= o =purrr=.
- sf :: de /Simple Feature/, sirve para importar, exportar y manipular datos vectoriales.
- fs :: proporciona una interfaz uniforme y multiplataforma para las operaciones del sistema de archivos.
- lubridate :: para manipulación de fechas y tiempos.
- feedeR :: para importar formatos de redifusión =RSS=.
- tmap :: para creación de mapas temáticos.
- classInt :: para crear intervalos de clase univariantes.
- sysfonts :: carga familias tipográficas del sistema y de Google.
- showtext :: para usar familias tipográficas en gráficos de R.

#+begin_src R :session urban :results output
if(!require("tidyverse")) install.packages("tidyverse")
if(!require("feedeR")) install.packages("feedeR")
if(!require("fs")) install.packages("fs")
if(!require("lubridate")) install.packages("lubridate")
if(!require("fs")) install.packages("fs")
if(!require("tmap")) install.packages("tmap")
if(!require("classInt")) install.packages("classInt")
if(!require("showtext")) install.packages("showtext")
if(!require("sysfonts")) install.packages("sysfonts")
if(!require("rvest")) install.packages("rvest")

library(feedeR)
library(sf)
library(fs)
library(tidyverse)
library(lubridate)
library(classInt)
library(tmap)
library(rvest)
#+end_src

#+RESULTS:
: Loading required package: tmap
: Loading required package: showtext
: Loading required package: showtextdb

** Enlaces de descarga

Se descarga una URL con un listado de provincias y sedes territorales que se inserta en el objeto =url=:

#+begin_src R :session urban :results output
url <- "http://www.catastro.minhap.es/INSPIRE/buildings/ES.SDGC.bu.atom.xml"
#+end_src

#+RESULTS:
** Extracción de redifusores
Luego se extraen los redifusores con la función =feed.extract()=

#+begin_src R :session urban :results output
prov_enlaces <- feed.extract(url)
str(prov_enlaces)
#+end_src

#+RESULTS:
#+begin_example
List of 4
 $ title  : chr "Download service of Buildings. Territorial Office"
 $ link   : chr "http://www.catastro.minhap.es/INSPIRE/buildings/ES.SDGC.BU.atom.xml"
 $ updated: POSIXct[1:1], format: "2020-10-27"
 $ items  : tibble [52 × 5] (S3: tbl_df/tbl/data.frame)
  ..$ title      : chr [1:52] "Territorial office 02 Albacete" "Territorial office 03 Alicante" "Territorial office 04 Almería" "Territorial office 05 Avila" ...
  ..$ date       : POSIXct[1:52], format: "2020-10-27" "2020-10-27" ...
  ..$ link       : chr [1:52] "http://www.catastro.minhap.es/INSPIRE/buildings/02/ES.SDGC.bu.atom_02.xml" "http://www.catastro.minhap.es/INSPIRE/buildings/03/ES.SDGC.bu.atom_03.xml" "http://www.catastro.minhap.es/INSPIRE/buildings/04/ES.SDGC.bu.atom_04.xml" "http://www.catastro.minhap.es/INSPIRE/buildings/05/ES.SDGC.bu.atom_05.xml" ...
  ..$ description: chr [1:52] "\n\t\t  " "\n\t\t  " "\n\t\t  " "\n\t\t  " ...
  ..$ hash       : chr [1:52] "d21ebb7975e59937" "bdba5e149f09e9d8" "03bcbcc7c5be2e17" "8a154202dd778143" ...
#+end_example
** TODO Extracción de enlaces

Se extrae la tabla con los enlaces:

#+begin_src R :session urban :results output
prov_enlaces_tab <- as_tibble(prov_enlaces$items) %>%
    mutate(title = repair_encoding(title))
prov_enlaces_tab
#+end_src

#+RESULTS:
#+begin_example
Best guess: UTF-8 (100% confident)
[90m# A tibble: 52 x 5[39m
   title         date                link                    description hash
   [3m[90m<chr>[39m[23m         [3m[90m<dttm>[39m[23m              [3m[90m<chr>[39m[23m                   [3m[90m<chr>[39m[23m       [3m[90m<chr>[39m[23m
[90m 1[39m [90m"[39mTerritorial… 2020-10-27 [90m00:00:00[39m http://www.catastro.mi… [90m"[39m\n\t\t  [90m"[39m  d21ebb…
[90m 2[39m [90m"[39mTerritorial… 2020-10-27 [90m00:00:00[39m http://www.catastro.mi… [90m"[39m\n\t\t  [90m"[39m  bdba5e…
[90m 3[39m [90m"[39mTerritorial… 2020-10-27 [90m00:00:00[39m http://www.catastro.mi… [90m"[39m\n\t\t  [90m"[39m  03bcbc…
[90m 4[39m [90m"[39mTerritorial… 2020-10-27 [90m00:00:00[39m http://www.catastro.mi… [90m"[39m\n\t\t  [90m"[39m  8a1542…
[90m 5[39m [90m"[39mTerritorial… 2020-10-27 [90m00:00:00[39m http://www.catastro.mi… [90m"[39m\n\t\t  [90m"[39m  7d3fd3…
[90m 6[39m [90m"[39mTerritorial… 2020-10-27 [90m00:00:00[39m http://www.catastro.mi… [90m"[39m\n\t\t  [90m"[39m  9c0874…
[90m 7[39m [90m"[39mTerritorial… 2020-10-27 [90m00:00:00[39m http://www.catastro.mi… [90m"[39m\n\t\t  [90m"[39m  ff722b…
[90m 8[39m [90m"[39mTerritorial… 2020-10-27 [90m00:00:00[39m http://www.catastro.mi… [90m"[39m\n\t\t  [90m"[39m  b431aa…
[90m 9[39m [90m"[39mTerritorial… 2020-10-27 [90m00:00:00[39m http://www.catastro.mi… [90m"[39m\n\t\t  [90m"[39m  f79c65…
[90m10[39m [90m"[39mTerritorial… 2020-10-27 [90m00:00:00[39m http://www.catastro.mi… [90m"[39m\n\t\t  [90m"[39m  d702a6…
[90m# … with 42 more rows[39m
#+end_example
** Extracción de datos
Para encontrar el enlace se utiliza la función =filter()= del paquete =dplyr=. Luego se extrae el nombre del municipio en mayúsculas con la función =str_detect()= del paquete =stringr=. La función =pull()= permite extraer una columna de un =data.frame=.

Si la función =feed.extract()= no importa correctamente la cofificación =UTF-8= se solventa con la función =repair_encoding()= del paquete =rvest=.

#+begin_src R :session urban :results output
mur_atom <- filter(prov_enlaces_tab, str_detect(title, "Murcia")) %>% pull(link)
mur_enlaces <- feed.extract(mur_atom)
mur_enlaces_tab <- mur_enlaces$items
mur_enlaces_tab <- mutate(mur_enlaces_tab, title = repair_encoding(title), link = repair_encoding(link))
mur_link <- filter(mur_enlaces_tab, str_detect(title, "MURCIA")) %>% pull(link)
mur_link

#+end_src

#+RESULTS:
: Error: Problem with `mutate()` input `title`.
: [31m✖[39m No guess has more than 50% confidence
: [34mℹ[39m Input `title` is `repair_encoding(title)`.
: [90mRun `rlang::last_error()` to see where the error occurred.[39m
: [1] "http://www.catastro.minhap.es/INSPIRE/Buildings/30/30008-ALHAMA DE MURCIA/A.ES.SDGC.BU.30008.zip"
: [2] "http://www.catastro.minhap.es/INSPIRE/Buildings/30/30030-MURCIA/A.ES.SDGC.BU.30030.zip"

** Descarga de datos


La descarga se realiza con la función =download.file()= que únicamente tiene dos argumentos principales, el enlace de descarga y la ruta con el nombre del archivo.

Se utiliza la función =tempfile()= para crear un archivo temporal.

Como el archivo es un =ZIP= se utiliza la función =unzip()= para acceder a su contenido que requiere el nombre del archivo y el nombre de la carpeta donde se ponen los archivos que contiene el =ZIP=.

Por último la función =URLencode= codifica una dirección =URL= que tiene contiene caracteres especiales.
#+begin_src R :session urban :results output
  temp <- tempfile()

  download.file(URLencode(mur_link), temp)

unzip(temp, exdir = "edificios_murcia")

#+end_src

#+RESULTS:
: probando la URL 'http://www.catastro.minhap.es/INSPIRE/Buildings/30/30008-ALHAMA%20DE%20MURCIA/A.ES.SDGC.BU.30008.zip'
: Content type 'application/x-zip-compressed' length 4143676 bytes (4.0 MB)
: ==================================================
: downloaded 4.0 MB

** Importar los datos

Para importar los datos se utiliza la función =dir_ls()= del paquete =fs= que permite obtener los archivos y carpetas de una ruta concreta al mismo tiempo que filtramos por un patrón de texto, =regexp=, una expresión regular.

Se aplica la función =st_read()= del paquete =sf= al archivo espacial de formato =GML= (/Geography Markup Language/).

#+begin_src R :session urban :results output
library("sf")
file_mur <- dir_ls("edificios_murcia", regexp = "building.gml")
buildings_mur <- st_read(file_mur)
#+end_src

#+RESULTS:
: Reading layer `Building' from data source `/home/flow/Documentos/curro/master-uah/web/web-org/org/edificios_murcia/A.ES.SDGC.BU.30008.building.gml' using driver `GML'
: Simple feature collection with 8523 features and 24 fields
: geometry type:  MULTIPOLYGON
: dimension:      XY
: bbox:           xmin: 629971.4 ymin: 4176298 xmax: 652142.6 ymax: 4196243
: projected CRS:  ETRS89 / UTM zone 30N

# importamos los datos

** Preparación de los datos

Se convierte la columna de la edad del edificio =beginning= en clase =Date= con la función =as_date()=. La columna de de la fecha contiene algunas fechas en formato =--01-01=, lo cual no corresponde a ninguna fecha reconocible, por lo que se reemplaza el primer =-= con =0000=.

#+begin_src R :session urban :results output
buildings_mur <- mutate(buildings_mur,
               beginning = str_replace(beginning, "^-", "0000") %>%
                            ymd_hms() %>% as_date()
               )

#+end_src

#+RESULTS:
: Warning messages:
: 1: Problem with `mutate()` input `beginning`.
: [34mℹ[39m  1 failed to parse.
: [34mℹ[39m Input `beginning` is `str_replace(beginning, "^-", "0000") %>% ymd_hms() %>% as_date()`.
: 2:  1 failed to parse.

#
## Warning: 4 failed to parse.
** Gráfico de distribución
Antes de crear el mapa de la edad del edificado, lo que reflejará el crecimiento urbano, se realiza un gráfico de distribución de la fecha de construcción de los edificios. De esta manera se pueden identificar claramente períodos de expansión urbana. Para ello se utiliza el paquete =ggplot2= con la geometría de =geom_density()= para este objetivo. La función =font_add_google()= del paquete sysfonts permite descargar e incluir familias tipográficas de /Google/.
usar showtext para familias tipográficas
limitamos al periodo posterior a 1750

#+begin_src R :session urban :results output graphics file :file buildings_murcia.pdf
sysfonts::font_add_google("Montserrat", "Montserrat")
showtext::showtext_auto()
filter(buildings_mur, beginning >= "1751-01-01") %>%
 ggplot(aes(beginning)) +
    geom_density(fill = "#2166ac", alpha = 0.7) +
  scale_x_date(date_breaks = "20 year",
               date_labels = "%Y") +
  theme_minimal() +
  theme(title = element_text(family = "Montserrat"),
        axis.text = element_text(family = "Montserrat")) +
  labs(y = "",x = "", title = "Evolución del desarrollo urbano de Murcia desde 1750")

#+end_src

#+RESULTS:
[[file:buildings_murcia.pdf]]

** Radio de actuación
Se limita la visualización a un radio de dos kilómetros y medio desde el centro de la ciudad con la función =geocode_OSM()= del paquete =tmaptools= para obtener las coordenadas de Murcia en clase =sf=.
Luego se proyectan los puntos al sistema que se utiliza para el edificado, el =EPSG:25830=.
Por último se crea un buffer de dos mil quinientos metros con la función =st_buffer()= y la intersección de los edificios. También se puede crear en forma de rectángulo indicando el tipo de estilo con el argumento =endCapStyle = "SQUARE"=.

#+begin_src R :session urban :results output
ciudad_point <- tmaptools::geocode_OSM("MURCIA", as.sf = TRUE)
ciudad_point <- st_transform(ciudad_point, 25830)
point_bf <- st_buffer(ciudad_point, 2500)
buildings_mur25 <- st_intersection(buildings_mur, point_bf)
#+end_src

#+RESULTS:
: Warning message:
: attribute variables are assumed to be spatially constant throughout all geometries

** TODO Preparar los datos para el mapas
Para poder visualizar bien las diferentes épocas de crecimiento se categoriza el año en 15 grupos empleando cuartiles.

#+begin_src R :session urban :results output
br <- classIntervals(year(buildings_mur25$beginning), 15, "quantile")
lab <- names(print(br, under = "<", over = ">", cutlabels = FALSE))
buildings_mur25 <- mutate(buildings_mur25,
               yr_cl = cut(year(beginning), br$brks, labels = lab, include.lowest = TRUE))

#+end_src

#+RESULTS:
#+begin_example
Error in sVar[1:(length(sVar) - 1)] :
  solamente 0's pueden ser mezclados con subscritos negativos
Además: Warning messages:
1: In classIntervals(year(buildings_mur25$beginning), 15, "quantile") :
  n greater than number of different finite values\nn reset to number of different finite values
2: In classIntervals(year(buildings_mur25$beginning), 15, "quantile") :
  n same as number of different finite values\neach different finite value is a separate class
Error in print(br, under = "<", over = ">", cutlabels = FALSE) :
  objeto 'br' no encontrado
Error: Problem with `mutate()` input `yr_cl`.
[31m✖[39m objeto 'br' no encontrado
[34mℹ[39m Input `yr_cl` is `cut(year(beginning), br$brks, labels = lab, include.lowest = TRUE)`.
[90mRun `rlang::last_error()` to see where the error occurred.[39m
#+end_example

## style: quantile
##      < 1890 1890 - 1912 1912 - 1925 1925 - 1930 1930 - 1940 1940 - 1950
##         940        1369         971         596        1719        1080
## 1950 - 1957 1957 - 1962 1962 - 1966 1966 - 1970 1970 - 1973 1973 - 1977
##        1227        1266        1233        1165        1161         932
## 1977 - 1987 1987 - 1999      > 1999
##        1337        1197        1190
#categorizar el año

** Mapa de Murcia
El mapa creamos con el paquete tmap. Es una interesante alternativa a ggplot2. Se trata de un paquete de funciones especializadas en crear mapas temáticos. La filosofía del paquete sigue a la de ggplot2, creando multiples capas con diferentes funciones, que siempre empiezan con tm_* y se combinan con +. La construcción de un mapa con tmap siempre comienza con tm_shape(), donde se definen los datos que queremos dibujar. Luego agregamos la geometría correspondiente al tipo de datos (tm_polygon(), tm_border(), tm_dots() o incluso tm_raster()). La función tm_layout() ayuda a configurar el estilo del mapa.

Cuando necesitamos más colores del máximo permitido por RColorBrewer podemos pasar los colores a la función colorRampPalette(). Esta función interpola para un mayor número más colores de la gama.

#colores
col_spec <- RColorBrewer::brewer.pal(11, "Spectral")

#función de una gama de colores
col_spec_fun <- colorRampPalette(col_spec)


#crear los mapas
tm_shape(buildings_val25) +
  tm_polygons("yr_cl",
              border.col = "transparent",
              palette = col_spec_fun(15),
              textNA = "Sin dato",
              title = "") +
 tm_layout(bg.color = "black",
           outer.bg.color = "black",
           legend.outside = TRUE,
           legend.text.color = "white",
           legend.text.fontfamily = "Montserrat",
            panel.label.fontfamily = "Montserrat",
            panel.label.color = "white",
            panel.label.bg.color = "black",
            panel.label.size = 5,
            panel.label.fontface = "bold")


Mapa dinámico en leaflet
Una ventaja muy interesante es la función tmap_leaflet() del paquete tmap para pasar de forma sencilla un mapa creado en el mismo marco a leaflet.

#mapa tmap de Santiago
m <-   tm_shape(buildings_val25) +
          tm_polygons("yr_cl",
              border.col = "transparent",
              palette = col_spec_fun(15),
              textNA = "Without data",
              title = "")


#mapa dinámico
tmap_leaflet(m)

* Un mapa de calor como calendario
Otro [[https://dominicroye.github.io/en/2020/a-heatmap-as-calendar/][artículo de Dominic Royé]] para convertir un calendario en un mapa de calor con las temperaturas de un determinado lugar.

Utiliza tres paquetes: =tidyverse=, =lubridate= y =ragg=.

** Instalación
#+begin_src R :session calendar :results output
if(!require("tidyverse")) install.packages("tidyverse")
if(!require("ragg")) install.packages("ragg")
if(!require("lubridate")) install.packages("lubridate")

#+end_src

#+RESULTS:
#+begin_example
Loading required package: tidyverse
── [1mAttaching packages[22m ──────────────────────────────────────────── tidyverse 1.3.0 ──
[32m✔[39m [34mggplot2[39m 3.3.3     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.4     [32m✔[39m [34mdplyr  [39m 1.0.2
[32m✔[39m [34mtidyr  [39m 1.1.2     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.4.0     [32m✔[39m [34mforcats[39m 0.5.0
── [1mConflicts[22m ─────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
Loading required package: ragg
Loading required package: lubridate

Attaching package: ‘lubridate’

The following objects are masked from ‘package:base’:

    date, intersect, setdiff, union
#+end_example

# paquetes
** Carga
Por lo que se cargan:

#+begin_src R :session calendar :results output
library(tidyverse)
library(lubridate)
library(ragg)

#+end_src

#+RESULTS:

Data
In this example we will use the daily precipitation of Santiago de Compostela for this year 2020 (until December 20).

# import the data
dat_pr <- read_csv("precipitation_santiago.csv")
dat_pr
## # A tibble: 355 x 2
##    date          pr
##    <date>     <dbl>
##  1 2020-01-01   0
##  2 2020-01-02   0
##  3 2020-01-03   5.4
##  4 2020-01-04   0
##  5 2020-01-05   0
##  6 2020-01-06   0
##  7 2020-01-07   0
##  8 2020-01-08   1
##  9 2020-01-09   3.8
## 10 2020-01-10   0
## # ... with 345 more rows
Preparation
In the first step we must 1) complement the time series from December 21 to December 31 with NA, 2) add the day of the week, the month, the week number and the day. Depending on whether we want each week to start on Sunday or Monday, we indicate it in the wday() function.

dat_pr <- dat_pr %>%
          complete(date = seq(ymd("2020-01-01"),
                              ymd("2020-12-31"),
                              "day")) %>%
          mutate(weekday = wday(date, label = T, week_start = 1),
                 month = month(date, label = T, abbr = F),
                 week = isoweek(date),
                 day = day(date))
In the next step we need to make a change in the week of the year, which is because in certain years there may be, for example, a few days at the end of the year as the first week of the following year. We also create two new columns. On the one hand, we categorize precipitation into 14 classes and on the other, we define a white text color for darker tones in the heatmap.

dat_pr <- mutate(dat_pr,
                 week = case_when(month == "December" & week == 1 ~ 53,
                                  month == "January" & week %in% 52:53 ~ 0,
                                  TRUE ~ week),
                 pcat = cut(pr, c(-1, 0, .5, 1:5, 7, 9, 15, 20, 25, 30, 300)),
                 text_col = ifelse(pcat %in% c("(15,20]", "(20,25]", "(25,30]", "(30,300]"),
                                   "white", "black"))

dat_pr
## # A tibble: 366 x 8
##    date          pr weekday month    week   day pcat    text_col
##    <date>     <dbl> <ord>   <ord>   <dbl> <int> <fct>   <chr>
##  1 2020-01-01   0   Wed     January     1     1 (-1,0]  black
##  2 2020-01-02   0   Thu     January     1     2 (-1,0]  black
##  3 2020-01-03   5.4 Fri     January     1     3 (5,7]   black
##  4 2020-01-04   0   Sat     January     1     4 (-1,0]  black
##  5 2020-01-05   0   Sun     January     1     5 (-1,0]  black
##  6 2020-01-06   0   Mon     January     2     6 (-1,0]  black
##  7 2020-01-07   0   Tue     January     2     7 (-1,0]  black
##  8 2020-01-08   1   Wed     January     2     8 (0.5,1] black
##  9 2020-01-09   3.8 Thu     January     2     9 (3,4]   black
## 10 2020-01-10   0   Fri     January     2    10 (-1,0]  black
## # ... with 356 more rows
Visualization
First we create a color ramp from Brewer colors.

# color ramp
pubu <- RColorBrewer::brewer.pal(9, "PuBu")
col_p <- colorRampPalette(pubu)
Second, before building the chart, we define a custom theme as a function. To do this, we specify all the elements and their modifications with the help of the theme() function.

theme_calendar <- function(){

 theme(aspect.ratio = 1/2,

       axis.title = element_blank(),
       axis.ticks = element_blank(),
       axis.text.y = element_blank(),
       axis.text = element_text(family = "Montserrat"),

       panel.grid = element_blank(),
       panel.background = element_blank(),

       strip.background = element_blank(),
       strip.text = element_text(family = "Montserrat", face = "bold", size = 15),

       legend.position = "top",
       legend.text = element_text(family = "Montserrat", hjust = .5),
       legend.title = element_text(family = "Montserrat", size = 9, hjust = 1),

       plot.caption =  element_text(family = "Montserrat", hjust = 1, size = 8),
       panel.border = element_rect(colour = "grey", fill=NA, size=1),
       plot.title = element_text(family = "Montserrat", hjust = .5, size = 26,
                                 face = "bold",
                                 margin = margin(0,0,0.5,0, unit = "cm")),
       plot.subtitle = element_text(family = "Montserrat", hjust = .5, size = 16)
  )
}
Finally, we build the final chart using geom_tile() and specify the day of the week as the X axis and the week number as the Y axis. As you can see in the variable of the week number (-week), I change the sign so that the first day of each month is in the first row. With geom_text() we add the number of each day with its color according to what we defined previously. In guides we make the adjustments of the colorbar and in scale_fill/colour_manual() we define the corresponding colors. An important step is found in facet_wrap() where we specify the facets composition of each month. The facets should have free scales and the ideal would be a 4 x 3 facet distribution. It is possible to modify the position of the day number to another using the arguments nudge_* in geom_text() (eg bottom-right corner: nudge_x = .35, nudge_y = -.25).

    ggplot(dat_pr,
           aes(weekday, -week, fill = pcat)) +
      geom_tile(colour = "white", size = .4)  +
      geom_text(aes(label = day, colour = text_col), size = 2.5) +
      guides(fill = guide_colorsteps(barwidth = 25,
                                     barheight = .4,
                                    title.position = "top")) +
       scale_fill_manual(values = c("white", col_p(13)),
                         na.value = "grey90", drop = FALSE) +
       scale_colour_manual(values = c("black", "white"), guide = FALSE) +
       facet_wrap(~ month, nrow = 4, ncol = 3, scales = "free") +
       labs(title = "How is 2020 being in Santiago?",
             subtitle = "Precipitation",
             caption = "Data: Meteogalicia",
             fill = "mm") +
       theme_calendar()


To export we will use the ragg package, which provides higher performance and quality than the standard raster devices provided by grDevices.

ggsave("pr_calendar.png", height = 10, width = 8, device = agg_png())
In other heatmap calendars I have added the predominant wind direction of each day as an arrow using geom_arrow() from the metR package (it can be seen in the aforementioned application).
* Introducción a Tidyverse
=tidyverse= es una colección de paquetes para la ciencia de datos.

El primer argumento es el objeto y a continuación viene el resto de argumentos.
Se proporciona un conjunto de verbos que facilitan el uso de las funciones.
La filosofía de las funciones también se refleja en otros paquetes que hacen compatible su uso con la colección de tidyverse. Por ejemplo, el paquete =sf= (simple feature) para el tratamiento de datos vectoriales, permite el uso de múltiples funciones que se encuentran en el paquete =dplyr=.
** Paquetes
El núcleo de la colección lo constituyen los siguientes paquetes:
- =ggplot2= :: gramática para la creación de gráficos.
- =purrr= :: programación funcional de R.
- tibble :: sistema moderno y efectivo de tablas.
- dplyr :: gramática para la manipulación de datos.
- tidyr :: conjunto de funciones para ordenar datos.
- stringr :: conjunto de funciones para trabajar con caracteres.
- readr :: importar datos.
- forcats :: trabajar con factores.

Además, también se usa muy frecuentemente =lubridate= para trabajar con fechas y horas o =readxl= para importar archivos en formato Excel.

Para conocer todos los paquetes disponibles podemos emplear la función tidyverse_packages().

#+begin_src R :session tidyverse :results output
library(tidyverse)
tidyverse_packages()
#+end_src

#+RESULTS:
#+begin_example
── [1mAttaching packages[22m ──────────────────────────────────────────── tidyverse 1.3.0 ──
[32m✔[39m [34mggplot2[39m 3.3.3     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.4     [32m✔[39m [34mdplyr  [39m 1.0.2
[32m✔[39m [34mtidyr  [39m 1.1.2     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.4.0     [32m✔[39m [34mforcats[39m 0.5.0
── [1mConflicts[22m ─────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
 [1] "broom"      "cli"        "crayon"     "dbplyr"     "dplyr"
 [6] "forcats"    "ggplot2"    "haven"      "hms"        "httr"
[11] "jsonlite"   "lubridate"  "magrittr"   "modelr"     "pillar"
[16] "purrr"      "readr"      "readxl"     "reprex"     "rlang"
[21] "rstudioapi" "rvest"      "stringr"    "tibble"     "tidyr"
[26] "xml2"       "tidyverse"
#+end_example

Los =conflicts= o conflictos ocurren cuando el nombre de una función se encuentra en más de un paquete. Para evitarlo se escribe el nombre del paquete delante de la función y a continuación dos símbolos de dos puntos:

#+begin_example
nombre_paquete::nombre_funcion()
#+end_example
** Guía de estilo
Aunque se puede trabajar como se hace normalmente con =R=, =tidyverse= tiene su propia [[https://style.tidyverse.org][guía de estilo]].

Lo fundamental es:
- Evitar usar más de 80 caracteres por línea para permitir leer el código completo.
- Usar siempre un espacio después de una coma, nunca antes.
- Los operadores (=\=\==, =+=, =-=, =<-=, =%>%=, etc.) deben tener un espacio antes y después.
- No hay espacio entre el nombre de una función y el primer paréntesis, ni entre el último argumento y el paréntesis final de una función.
- Evitar reutilizar nombres de funciones y variables comunes (=c <- 5= vs. =c()=).
- Ordenar el script separando las partes con la forma de comentario =# Importar datos -----=
- Se deben evitar tildes o símbolos especiales en nombres, archivos, rutas, etc.
- Los nombres de los objetos deben seguir una estructura constante: =dia_uno=, =dia_1=, etc.
- Es aconsejable usar una correcta indentación para multiples argumentos de una función o funciones encadenadas por el operador pipe (=%>%=).
** Tuberías o =pipe %>%=
Para facilitar el trabajo en la gestión, manipulación y visualización de datos, el paquete =magrittr= introduce el operador tubería o /pipe/ en la forma =%>%= con el objetivo de combinar varias funciones sin la necesidad de asignar el resultado a un nuevo objeto. El operador =pipe= pasa a la salida de una función aplicada al primer argumento de la siguiente función. Esta forma de combinar funciones permite encadenar varios pasos de forma simultánea. En el siguiente ejemplo, muy sencillo, pasamos el vector =1:5= a la función =mean()= para calcular el promedio.

#+begin_src R :session tidyverse :results output
1:5 %>% mean()
#+end_src

#+RESULTS:
: [1] 3
** Paquetes de Tidyverse

*** Lectura y escritura
El paquete =readr= facilita la lectura o escritura de múltiples formatos de archivo usando funciones que comienzan por =read_*= o =write_*=. En comparación con =R Base=, las funciones son más rápidas, ayudan a limpiar los nombres de las columnas y las fechas son convertidas automáticamente. Las tablas importadas son de clase =tibble= (=tbl_df=), una versión moderna de =data.frame= del paquete =tibble=. En el mismo sentido se puede usar la función =read_excel()= del paquete =readxl= para [[https://dominicroye.github.io/es/2019/importar-varias-hojas-excel-en-r/][importar datos de hojas de Excel]]. En el siguiente ejemplo importamos los [[https://www.google.com/covid19/mobility/][datos de la movilidad registrada por Google]] [[https://dominicroye.github.io/files/Global_Mobility_Report.csv][durante los últimos meses]] a causa de la pandemia COVID-19.

Funciones que se utilizan:
- =read_csv()= o =read_csv2()= para archivos separados por comas o punto y coma.
- =read_delim()=, si el archivo tiene un separador general.
- =read_table()=, si el archivo tiene un espacio en blanco.

** Cargar el paquete

#+begin_src R :session tidyverse :results output
library(tidyverse)
#+end_src

#+RESULTS:

google_mobility <- read_csv("Global_Mobility_Report.csv")
## Parsed with column specification:
## cols(
##   country_region_code = col_character(),
##   country_region = col_character(),
##   sub_region_1 = col_character(),
##   sub_region_2 = col_logical(),
##   iso_3166_2_code = col_character(),
##   census_fips_code = col_logical(),
##   date = col_date(format = ""),
##   retail_and_recreation_percent_change_from_baseline = col_double(),
##   grocery_and_pharmacy_percent_change_from_baseline = col_double(),
##   parks_percent_change_from_baseline = col_double(),
##   transit_stations_percent_change_from_baseline = col_double(),
##   workplaces_percent_change_from_baseline = col_double(),
##   residential_percent_change_from_baseline = col_double()
## )
## Warning: 597554 parsing failures.
##    row              col           expected         actual                         file
## 200119 sub_region_2     1/0/T/F/TRUE/FALSE Autauga County 'Global_Mobility_Report.csv'
## 200119 census_fips_code 1/0/T/F/TRUE/FALSE 01001          'Global_Mobility_Report.csv'
## 200120 sub_region_2     1/0/T/F/TRUE/FALSE Autauga County 'Global_Mobility_Report.csv'
## 200120 census_fips_code 1/0/T/F/TRUE/FALSE 01001          'Global_Mobility_Report.csv'
## 200121 sub_region_2     1/0/T/F/TRUE/FALSE Autauga County 'Global_Mobility_Report.csv'
## ...... ................ .................. .............. ............................
## See problems(...) for more details.
google_mobility
## # A tibble: 516,697 x 13
##    country_region_~ country_region sub_region_1 sub_region_2 iso_3166_2_code
##    <chr>            <chr>          <chr>        <lgl>        <chr>
##  1 AE               United Arab E~ <NA>         NA           <NA>
##  2 AE               United Arab E~ <NA>         NA           <NA>
##  3 AE               United Arab E~ <NA>         NA           <NA>
##  4 AE               United Arab E~ <NA>         NA           <NA>
##  5 AE               United Arab E~ <NA>         NA           <NA>
##  6 AE               United Arab E~ <NA>         NA           <NA>
##  7 AE               United Arab E~ <NA>         NA           <NA>
##  8 AE               United Arab E~ <NA>         NA           <NA>
##  9 AE               United Arab E~ <NA>         NA           <NA>
## 10 AE               United Arab E~ <NA>         NA           <NA>
## # ... with 516,687 more rows, and 8 more variables: census_fips_code <lgl>,
## #   date <date>, retail_and_recreation_percent_change_from_baseline <dbl>,
## #   grocery_and_pharmacy_percent_change_from_baseline <dbl>,
## #   parks_percent_change_from_baseline <dbl>,
## #   transit_stations_percent_change_from_baseline <dbl>,
## #   workplaces_percent_change_from_baseline <dbl>,
## #   residential_percent_change_from_baseline <dbl>
Debemos prestar atención a los nombres de los argumentos, ya que cambian en las funciones de readr. Por ejemplo, el argumento conocido header = TRUE de read.csv() es en este caso col_names = TRUE. Podemos encontrar más detalles en el Cheat-Sheet de readr .

4.2 Manipulación de caracteres
Cuando se requiere manipular cadenas de texto usamos el paquete stringr, cuyas funciones siempre empiezan por str_* seguidas por un verbo y el primer argumento.

Algunas de estas funciones son las siguientes:

Función	Descripción
str_replace()	reemplazar patrones
str_c()	combinar characteres
str_detect()	detectar patrones
str_extract()	extraer patrones
str_sub()	extraer por posición
str_length()	longitud de la cadena de caracteres
Se suelen usar expresiones regulares para patrones de caracteres. Por ejemplo, la expresión regular [aeiou] coincide con cualquier caracter único que sea una vocal. El uso de corchetes [] corresponde a clases de caracteres. Por ejemplo, [abc] corresponde a cada letra independientemente de la posición. [a-z] o [A-Z] o [0-9] cada uno entre a y z ó 0 y 9. Y por último, [:punct:] puntuación, etc. Con llaves “{}” podemos indicar el número del elemento anterior {2} sería dos veces, {1,2} entre una y dos, etc. Además con $o ^ podemos indicar si el patrón empieza al principio o termina al final. Podemos encontrar más detalles y patrones en el Cheat-Sheet de stringr.

# reemplazamos 'er' al final por vacío

str_replace(month.name, "er$", "")
##  [1] "January"  "February" "March"    "April"    "May"      "June"
##  [7] "July"     "August"   "Septemb"  "Octob"    "Novemb"   "Decemb"
str_replace(month.name, "^Ma", "")
##  [1] "January"   "February"  "rch"       "April"     "y"         "June"
##  [7] "July"      "August"    "September" "October"   "November"  "December"
# combinar caracteres

a <- str_c(month.name, 1:12, sep = "_")
a
##  [1] "January_1"   "February_2"  "March_3"     "April_4"     "May_5"
##  [6] "June_6"      "July_7"      "August_8"    "September_9" "October_10"
## [11] "November_11" "December_12"
# colapsar combinación

str_c(month.name, collapse = ", ")
## [1] "January, February, March, April, May, June, July, August, September, October, November, December"
# dedectamos patrones

str_detect(a, "_[1-5]{1}")
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE
# extraemos patrones

str_extract(a, "_[1-9]{1,2}")
##  [1] "_1"  "_2"  "_3"  "_4"  "_5"  "_6"  "_7"  "_8"  "_9"  "_1"  "_11" "_12"
# extraermos los caracteres en las posiciones entre 1 y 2

str_sub(month.name, 1, 2)
##  [1] "Ja" "Fe" "Ma" "Ap" "Ma" "Ju" "Ju" "Au" "Se" "Oc" "No" "De"
# longitud de cada mes

str_length(month.name)
##  [1] 7 8 5 5 3 4 4 6 9 7 8 8
# con pipe, el '.' representa al objeto que pasa el operador %>%
str_length(month.name) %>%
   str_c(month.name, ., sep = ".")
##  [1] "January.7"   "February.8"  "March.5"     "April.5"     "May.3"
##  [6] "June.4"      "July.4"      "August.6"    "September.9" "October.7"
## [11] "November.8"  "December.8"
Una función muy útil es str_glue() para interpolar caracteres.

name <- c("Juan", "Michael")
age <- c(50, 80)
date_today <- Sys.Date()

str_glue(
  "My name is {name}, ",
  "I'am {age}, ",
  "and my birth year is {format(date_today-age*365, '%Y')}."
)
## My name is Juan, I'am 50, and my birth year is 1970.
## My name is Michael, I'am 80, and my birth year is 1940.
4.3 Manejo de fechas y horas
El paquete lubridate ayuda en el manejo de fechas y horas. Nos permite crear los objetos reconocidos por R con funciones (como ymd() ó ymd_hms()) y hacer cálculos.

Debemos conocer las siguientes abreviaturas:

ymd: representa y:year, m:month, d:day
hms: representa h:hour, m:minutes, s:seconds
# paquete
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
##     date, intersect, setdiff, union
# vector de fechas
dat <- c("1999/12/31", "2000/01/07", "2005/05/20","2010/03/25")

# vector de fechas y horas
dat_time <- c("1988-08-01 05:00", "2000-02-01 22:00")

# convertir a clase date
dat <- ymd(dat)
dat
## [1] "1999-12-31" "2000-01-07" "2005-05-20" "2010-03-25"
# otras formatos
dmy("05-02-2000")
## [1] "2000-02-05"
ymd("20000506")
## [1] "2000-05-06"
# convertir a POSIXct
dat_time <- ymd_hm(dat_time)
dat_time
## [1] "1988-08-01 05:00:00 UTC" "2000-02-01 22:00:00 UTC"
# diferentes formatos en un vector
dat_mix <- c("1999/12/05", "05-09-2008", "2000/08/09", "25-10-2019")

# indicar formato con la convención conocida en ?strptime
parse_date_time(dat_mix, order = c("%Y/%m/%d", "%d-%m-%Y"))
## [1] "1999-12-05 UTC" "2008-09-05 UTC" "2000-08-09 UTC" "2019-10-25 UTC"
Más funciones útiles:

# extraer el año
year(dat)
## [1] 1999 2000 2005 2010
# el mes
month(dat)
## [1] 12  1  5  3
month(dat, label = TRUE) # como etiqueta
## [1] dic ene may mar
## 12 Levels: ene < feb < mar < abr < may < jun < jul < ago < sep < ... < dic
# el día de la semana
wday(dat)
## [1] 6 6 6 5
wday(dat, label = TRUE) # como etiqueta
## [1] vi\\. vi\\. vi\\. ju\\.
## Levels: do\\. < lu\\. < ma\\. < mi\\. < ju\\. < vi\\. < sá\\.
# la hora
hour(dat_time)
## [1]  5 22
# sumar 10 días
dat + days(10)
## [1] "2000-01-10" "2000-01-17" "2005-05-30" "2010-04-04"
# sumar 1 mes
dat + months(1)
## [1] "2000-01-31" "2000-02-07" "2005-06-20" "2010-04-25"
Por último, la función make_date() es muy útil en crear fechas a partir de diferentes partes de las mismas como puede ser el año, mes, etc.

# crear fecha a partir de sus elementos, aquí con año y mes
make_date(2000, 5)
## [1] "2000-05-01"
# crear fecha con hora
make_datetime(2005, 5, 23, 5)
## [1] "2005-05-23 05:00:00 UTC"
Podemos encontrar más detalles en el Cheat-Sheet de lubridate.

4.4 Manipulación de tablas y vectores
Los paquetes dplyr y tidyr nos proporciona una gramática de manipulación de datos con un conjunto de verbos útiles para resolver los problemas más comunes. Las funciones más importantes son:

Función	Descripción
mutate()	añadir nuevas variables o modificar existentes
select()	seleccionar variables
filter()	filtrar
summarise()	resumir/reducir
arrange()	ordenar
group_by()	agrupar
rename()	renombrar columnas
En caso de que no lo hayas hecho antes, importamos los datos de movilidad.

google_mobility <- read_csv("Global_Mobility_Report.csv")
## Parsed with column specification:
## cols(
##   country_region_code = col_character(),
##   country_region = col_character(),
##   sub_region_1 = col_character(),
##   sub_region_2 = col_logical(),
##   iso_3166_2_code = col_character(),
##   census_fips_code = col_logical(),
##   date = col_date(format = ""),
##   retail_and_recreation_percent_change_from_baseline = col_double(),
##   grocery_and_pharmacy_percent_change_from_baseline = col_double(),
##   parks_percent_change_from_baseline = col_double(),
##   transit_stations_percent_change_from_baseline = col_double(),
##   workplaces_percent_change_from_baseline = col_double(),
##   residential_percent_change_from_baseline = col_double()
## )
## Warning: 597554 parsing failures.
##    row              col           expected         actual                         file
## 200119 sub_region_2     1/0/T/F/TRUE/FALSE Autauga County 'Global_Mobility_Report.csv'
## 200119 census_fips_code 1/0/T/F/TRUE/FALSE 01001          'Global_Mobility_Report.csv'
## 200120 sub_region_2     1/0/T/F/TRUE/FALSE Autauga County 'Global_Mobility_Report.csv'
## 200120 census_fips_code 1/0/T/F/TRUE/FALSE 01001          'Global_Mobility_Report.csv'
## 200121 sub_region_2     1/0/T/F/TRUE/FALSE Autauga County 'Global_Mobility_Report.csv'
## ...... ................ .................. .............. ............................
## See problems(...) for more details.
4.4.1 Selecionar y renombrar
Podemos selecionar o eliminar columnas con la función select(), usando el nombre o índice de la(s) columna(s). Para suprimir columnas hacemos uso del signo negativo. La función rename ayuda en renombrar columnas o bien con el mismo nombre o con su índice.

residential_mobility <- select(google_mobility,
                               country_region_code:sub_region_1,
                               date,
                               residential_percent_change_from_baseline) %>%
                        rename(resi = 5)
4.4.2 Filtrar y ordenar
Para filtrar datos, empleamos filter() con operadores lógicos (|, ==, >, etc) o funciones que devuelven un valor lógico (str_detect(), is.na(), etc.). La función arrange() ordena de menor a mayor por una o múltiples variables (con el signo negativo - se invierte el orden de mayor a menor).

filter(residential_mobility,
       country_region_code == "US")
## # A tibble: 304,648 x 5
##    country_region_code country_region sub_region_1 date        resi
##    <chr>               <chr>          <chr>        <date>     <dbl>
##  1 US                  United States  <NA>         2020-02-15    -1
##  2 US                  United States  <NA>         2020-02-16    -1
##  3 US                  United States  <NA>         2020-02-17     5
##  4 US                  United States  <NA>         2020-02-18     1
##  5 US                  United States  <NA>         2020-02-19     0
##  6 US                  United States  <NA>         2020-02-20     1
##  7 US                  United States  <NA>         2020-02-21     0
##  8 US                  United States  <NA>         2020-02-22    -1
##  9 US                  United States  <NA>         2020-02-23    -1
## 10 US                  United States  <NA>         2020-02-24     0
## # ... with 304,638 more rows
filter(residential_mobility,
       country_region_code == "US",
       sub_region_1 == "New York")
## # A tibble: 7,068 x 5
##    country_region_code country_region sub_region_1 date        resi
##    <chr>               <chr>          <chr>        <date>     <dbl>
##  1 US                  United States  New York     2020-02-15     0
##  2 US                  United States  New York     2020-02-16    -1
##  3 US                  United States  New York     2020-02-17     9
##  4 US                  United States  New York     2020-02-18     3
##  5 US                  United States  New York     2020-02-19     2
##  6 US                  United States  New York     2020-02-20     2
##  7 US                  United States  New York     2020-02-21     3
##  8 US                  United States  New York     2020-02-22    -1
##  9 US                  United States  New York     2020-02-23    -1
## 10 US                  United States  New York     2020-02-24     0
## # ... with 7,058 more rows
filter(residential_mobility,
       resi > 50) %>%
          arrange(-resi)
## # A tibble: 32 x 5
##    country_region_co~ country_region sub_region_1               date        resi
##    <chr>              <chr>          <chr>                      <date>     <dbl>
##  1 KW                 Kuwait         Al Farwaniyah Governorate  2020-05-14    56
##  2 KW                 Kuwait         Al Farwaniyah Governorate  2020-05-21    55
##  3 SG                 Singapore      <NA>                       2020-05-01    55
##  4 KW                 Kuwait         Al Farwaniyah Governorate  2020-05-28    54
##  5 PE                 Peru           Metropolitan Municipality~ 2020-04-10    54
##  6 EC                 Ecuador        Pichincha                  2020-03-27    53
##  7 KW                 Kuwait         Al Farwaniyah Governorate  2020-05-11    53
##  8 KW                 Kuwait         Al Farwaniyah Governorate  2020-05-13    53
##  9 KW                 Kuwait         Al Farwaniyah Governorate  2020-05-20    53
## 10 SG                 Singapore      <NA>                       2020-04-10    53
## # ... with 22 more rows
4.4.3 Agrupar y resumir
¿Dónde encontramos mayor variabilidad entre regiones en cada país el día 1 de abril de 2020?

Para responder a esta pregunta, primero filtramos los datos y después agrupamos por la columna de país. Cuando empleamos la función summarise() posterior a la agrupación, nos permite resumir por estos grupos. Incluso, la combinación del group_by() con la función mutate() permite modificar columnas por grupos. En summarise() calculamos el valor máximo, mínimo y la diferencia entre ambos extremos creando nuevas columnas.

resi_variability <- residential_mobility %>%
                        filter(date == ymd("2020-04-01"),
                               !is.na(sub_region_1)) %>%
                          group_by(country_region) %>%
                       summarise(mx = max(resi, na.rm = TRUE),
                                 min = min(resi, na.rm = TRUE),
                                 range = abs(mx)-abs(min))
## Warning in max(resi, na.rm = TRUE): ningun argumento finito para max; retornando
## -Inf

## Warning in max(resi, na.rm = TRUE): ningun argumento finito para max; retornando
## -Inf

## Warning in max(resi, na.rm = TRUE): ningun argumento finito para max; retornando
## -Inf

## Warning in max(resi, na.rm = TRUE): ningun argumento finito para max; retornando
## -Inf

## Warning in max(resi, na.rm = TRUE): ningun argumento finito para max; retornando
## -Inf

## Warning in max(resi, na.rm = TRUE): ningun argumento finito para max; retornando
## -Inf
## Warning in min(resi, na.rm = TRUE): ningún argumento finito para min; retornando
## Inf

## Warning in min(resi, na.rm = TRUE): ningún argumento finito para min; retornando
## Inf

## Warning in min(resi, na.rm = TRUE): ningún argumento finito para min; retornando
## Inf

## Warning in min(resi, na.rm = TRUE): ningún argumento finito para min; retornando
## Inf

## Warning in min(resi, na.rm = TRUE): ningún argumento finito para min; retornando
## Inf

## Warning in min(resi, na.rm = TRUE): ningún argumento finito para min; retornando
## Inf
## `summarise()` ungrouping output (override with `.groups` argument)
arrange(resi_variability, -range)
## # A tibble: 94 x 4
##    country_region    mx   min range
##    <chr>          <dbl> <dbl> <dbl>
##  1 Nigeria           43     6    37
##  2 United States     35     6    29
##  3 India             36    15    21
##  4 Malaysia          45    26    19
##  5 Philippines       40    21    19
##  6 Vietnam           28     9    19
##  7 Colombia          41    24    17
##  8 Ecuador           44    27    17
##  9 Argentina         35    19    16
## 10 Chile             30    14    16
## # ... with 84 more rows
4.4.4 Unir tablas
¿Cómo podemos filtrar los datos para obtener un subconjunto de Europa?

Para ello, importamos datos espaciales con el código de país y una columna de las regiones. Explicaciones detalladas sobre el paquete sf (simple feature) para trabajar con datos vectoriales, lo dejaremos para otro post.

library(rnaturalearth) # paquete de datos vectoriales

# datos de países
wld <- ne_countries(returnclass = "sf")

# filtramos los países con código y seleccionamos las dos columnas de interés
wld <- filter(wld, !is.na(iso_a2)) %>% select(iso_a2, subregion)

# plot
plot(wld)


Otras funciones de dplyr nos permiten unir tablas: *_join(). Según hacia qué tabla (izquierda o derecha) se quiere unir, cambia la función : left_join(), right_join() o incluso full_join(). El argumento by no es necesario siempre y cuando ambas tablas tienen una columna en común. No obstante, en este caso la columna de fusión es diferente, por eso, usamos el modo c("country_region_code"="iso_a2"). El paquete forcats de tidyverse tiene muchas funciones útiles para manejar variables categóricas (factors), variables que tienen un conjunto fijo y conocido de valores posibles. Todas las funciones de forcats tienen el prefijo fct_*. Por ejemplo, en este caso usamos fct_reorder() para reordenar las etiquetas de los países en orden de la máxima basada en los registros de movibilidad residencial. Finalmente, creamos una nueva columna ‘resi_real’ para cambiar el valor de referencia, el promedio (baseline), fijado en 0 a 100.

subset_europe <- filter(residential_mobility,
                        is.na(sub_region_1),
                        !is.na(resi)) %>%
                 left_join(wld, by = c("country_region_code"="iso_a2")) %>%
                 filter(subregion %in% c("Northern Europe",
                                         "Southern Europe",
                                          "Western Europe",
                                          "Eastern Europe")) %>%
                 mutate(resi_real = resi + 100,
                        region = fct_reorder(country_region,
                                             resi,
                                            .fun = "max",
                                            .desc = FALSE)) %>%
                select(-geometry, -sub_region_1)

str(subset_europe)
## tibble [3,988 x 7] (S3: tbl_df/tbl/data.frame)
##  $ country_region_code: chr [1:3988] "AT" "AT" "AT" "AT" ...
##  $ country_region     : chr [1:3988] "Austria" "Austria" "Austria" "Austria" ...
##  $ date               : Date[1:3988], format: "2020-02-15" "2020-02-16" ...
##  $ resi               : num [1:3988] -2 -2 0 0 1 0 1 -2 0 -1 ...
##  $ subregion          : chr [1:3988] "Western Europe" "Western Europe" "Western Europe" "Western Europe" ...
##  $ resi_real          : num [1:3988] 98 98 100 100 101 100 101 98 100 99 ...
##  $ region             : Factor w/ 35 levels "Belarus","Ukraine",..: 18 18 18 18 18 18 18 18 18 18 ...
##  - attr(*, "problems")= tibble [597,554 x 5] (S3: tbl_df/tbl/data.frame)
##   ..$ row     : int [1:597554] 200119 200119 200120 200120 200121 200121 200122 200122 200123 200123 ...
##   ..$ col     : chr [1:597554] "sub_region_2" "census_fips_code" "sub_region_2" "census_fips_code" ...
##   ..$ expected: chr [1:597554] "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" ...
##   ..$ actual  : chr [1:597554] "Autauga County" "01001" "Autauga County" "01001" ...
##   ..$ file    : chr [1:597554] "'Global_Mobility_Report.csv'" "'Global_Mobility_Report.csv'" "'Global_Mobility_Report.csv'" "'Global_Mobility_Report.csv'" ...
4.4.5 Tablas largas y anchas
Antes de pasar a la visualización con ggplot2. Es muy habitual modificar la tabla entre dos formatos principales. Una tabla es tidy cuando 1) cada variable es una columna 2) cada observación/caso es una fila y 3) cada tipo de unidad observacional forma una tabla.

# subconjunto
mobility_selection <- select(subset_europe, country_region_code, date:resi)
mobility_selection
## # A tibble: 3,988 x 3
##    country_region_code date        resi
##    <chr>               <date>     <dbl>
##  1 AT                  2020-02-15    -2
##  2 AT                  2020-02-16    -2
##  3 AT                  2020-02-17     0
##  4 AT                  2020-02-18     0
##  5 AT                  2020-02-19     1
##  6 AT                  2020-02-20     0
##  7 AT                  2020-02-21     1
##  8 AT                  2020-02-22    -2
##  9 AT                  2020-02-23     0
## 10 AT                  2020-02-24    -1
## # ... with 3,978 more rows
# tabla ancha
mobi_wide <- pivot_wider(mobility_selection,
                         names_from = country_region_code,
                         values_from = resi)
mobi_wide
## # A tibble: 114 x 36
##    date          AT    BA    BE    BG    BY    CH    CZ    DE    DK    EE    ES
##    <date>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 2020-02-15    -2    -1    -1     0    -1    -1    -2    -1     0     0    -2
##  2 2020-02-16    -2    -1     1    -3     0    -1    -1     0     1     0    -2
##  3 2020-02-17     0    -1     0    -2     0     1     0     0     1     1    -1
##  4 2020-02-18     0    -1     0    -2     0     1     0     1     1     1     0
##  5 2020-02-19     1    -1     0    -1    -1     1     0     1     1     0    -1
##  6 2020-02-20     0    -1     0     0    -1     0     0     1     1     0    -1
##  7 2020-02-21     1    -2     0    -1    -1     1     0     2     1     1    -2
##  8 2020-02-22    -2    -1     0     0    -2    -2    -3     0     1     0    -2
##  9 2020-02-23     0    -1     0    -3    -1    -1     0     0     0    -2    -3
## 10 2020-02-24    -1    -1     4    -1     0     0     0     4     0    16     0
## # ... with 104 more rows, and 24 more variables: FI <dbl>, FR <dbl>, GB <dbl>,
## #   GR <dbl>, HR <dbl>, HU <dbl>, IE <dbl>, IT <dbl>, LT <dbl>, LU <dbl>,
## #   LV <dbl>, MD <dbl>, MK <dbl>, NL <dbl>, NO <dbl>, PL <dbl>, PT <dbl>,
## #   RO <dbl>, RS <dbl>, RU <dbl>, SE <dbl>, SI <dbl>, SK <dbl>, UA <dbl>
# tabla larga
pivot_longer(mobi_wide,
             2:36,
             names_to = "country_code",
             values_to = "resi")
## # A tibble: 3,990 x 3
##    date       country_code  resi
##    <date>     <chr>        <dbl>
##  1 2020-02-15 AT              -2
##  2 2020-02-15 BA              -1
##  3 2020-02-15 BE              -1
##  4 2020-02-15 BG               0
##  5 2020-02-15 BY              -1
##  6 2020-02-15 CH              -1
##  7 2020-02-15 CZ              -2
##  8 2020-02-15 DE              -1
##  9 2020-02-15 DK               0
## 10 2020-02-15 EE               0
## # ... with 3,980 more rows
Otro grupo de funciones a las que deberías echar un vistazo son: separate(), case_when(), complete(). Podemos encontrar más detalles en el Cheat-Sheet de dplyr

4.5 Visualizar datos
ggplot2 es un sistema moderno, y con una enorme variedad de opciones, para visualización de datos. A diferencia del sistema gráfico de R Base se utiliza una gramática diferente. La gramática de los gráficos (grammar of graphics, de allí “gg”) consiste en la suma de varias capas u objetos independientes que se combinan usando + para construir el gráfico final. ggplot diferencia entre los datos, lo que se visualiza y la forma en que se visualiza.

data: nuestro conjunto de datos (data.frame o tibble)

aesthetics: con la función aes() indicamos las variables que corresponden a los ejes x, y, z,… o, cuando se pretende aplicar parámetros gráficos (color, size, shape) según una variable. Es posible incluir aes() en ggplot() o en la función correspondiente a una geometría geom_*.

geometries: son objetos geom_* que indican la geometría a usar, (p. ej.: geom_point(), geom_line(), geom_boxplot(), etc.).

scales: son objetos de tipo scales_* (p. ej.: scale_x_continous(), scale_colour_manual()) para manipular las ejes, definir colores, etc.

statistics: son objetos stat_* (p.ej.: stat_density()) que permiten aplicar transformaciones estadísticas.

Podemos encontrar más detalles en el Cheat-Sheet de ggplot2. ggplot es complementado constantemente con extensiones para geometrías u otras opciones gráficas (https://exts.ggplot2.tidyverse.org/ggiraph.html), para obtener ideas gráficas, debes echarle un vistazo a la Galería de Gráficos R (https://www.r-graph-gallery.com/).

4.5.1 Gráfico de linea y puntos
Creamos un subconjunto de nuestros datos de movilidad para residencias y parques, filtrando los registros de regiones italianas. Además, dividimos los valores de movilidad en porcentaje por 100 para obtener la fracción, ya que ggplot2 nos permite indicar la unidad de porcentaje en el argumento de las etiquetas (último gráfico de esta sección).

# creamos el subconjunto
it <- filter(google_mobility,
             country_region == "Italy",
             is.na(sub_region_1)) %>%
      mutate(resi = residential_percent_change_from_baseline/100,
             parks = parks_percent_change_from_baseline/100)


# gráfico de línea
ggplot(it,
       aes(date, resi)) +
  geom_line()


# gráfico de dispersión con línea de correlación
ggplot(it,
       aes(parks, resi)) +
  geom_point() +
  geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'


Para modificar los ejes, empleamos las diferentes funciones de scale_* que debemos adpatar a las escalas de medición (date, discrete, continuous, etc.). La función labs() nos ayuda en definir los títulos de ejes, del gráfico y de la leyenda. Por último, añadimos con theme_light() el estilo del gráfico (otros son theme_bw(), theme_minimal(), etc.). También podríamos hacer cambios de todos los elementos gráficos a través de theme().

# time serie plot
ggplot(it,
       aes(date, resi)) +
  geom_line(colour = "#560A86", size = 0.8) +
  scale_x_date(date_breaks = "10 days",
               date_labels = "%d %b") +
  scale_y_continuous(breaks = seq(-0.1, 1, 0.1),
                     labels = scales::percent) +
  labs(x = "",
       y = "Residential mobility",
       title = "Mobility during COVID-19") +
  theme_light()


# scatter plot
ggplot(it,
       aes(parks, resi)) +
  geom_point(alpha = .4, size = 2) +
  geom_smooth(method = "lm") +
  scale_x_continuous(breaks = seq(-1, 1.4, 0.2),
                     labels = scales::percent) +
  scale_y_continuous(breaks = seq(-1, 1, 0.1),
                     labels = scales::percent) +
  labs(x = "Park mobility",
       y = "Residential mobility",
       title = "Mobility during COVID-19") +
  theme_light()
## `geom_smooth()` using formula 'y ~ x'


4.5.2 Boxplot
Podemos visualizar diferentes aspectos de los datos de movilidad con otras geometrías. Aquí creamos boxplots por cada país europeo representando la variabilidad de movilidad entre y en los países durante la pandemia del COVID-19.

# subconjunto
subset_europe_reg <- filter(residential_mobility,
                           !is.na(sub_region_1),
                           !is.na(resi)) %>%
                     left_join(wld, by = c("country_region_code"="iso_a2")) %>%
                     filter(subregion %in% c("Northern Europe",
                                         "Southern Europe",
                                          "Western Europe",
                                          "Eastern Europe")) %>%
                     mutate(resi = resi/100,
                            country_region = fct_reorder(country_region, resi))

# boxplot
ggplot(subset_europe_reg,
       aes(country_region, resi, fill = subregion)) +
  geom_boxplot() +
  scale_y_continuous(breaks = seq(-0.1, 1, 0.1), labels = scales::percent) +
  scale_fill_brewer(palette = "Set1") +
  coord_flip() +
   labs(x = "",
       y = "Residential mobility",
       title = "Mobility during COVID-19",
       fill = "") +
  theme_minimal()


4.5.3 Heatmap
Para visualizar la tendencia de todos los países europeos es recomendable usar un heatmap en lugar de un bulto de líneas. Antes de constuir el gráfico, creamos un vector de fechas para las etiquetas con los domingos en el período de registros.

# secuencia de fechas
df <- data.frame(d = seq(ymd("2020-02-15"), ymd("2020-06-07"), "day"))

# filtramos los domingos creando el día de la semana
sundays <- df %>%
            mutate(wd = wday(d, week_start = 1)) %>%
             filter(wd == 7) %>%
              pull(d)
Si queremos usar etiquetas en otras lenguas, es necesario cambiar la configuración regional del sistema.

Sys.setlocale("LC_TIME", "English")
## [1] "English_United States.1252"
El relleno de color para los boxplots lo dibujamos por cada región de los países europeos. Podemos fijar el tipo de color con scale_fill_*, en este caso, de las gamas viridis.
Además, la función guides() nos permite modificar la barra de color de la leyenda. Por último, aquí vemos el uso de theme() con cambios adicionales a theme_minimal().

# headmap
ggplot(subset_europe,
       aes(date, region, fill = resi_real)) +
  geom_tile() +
  scale_x_date(breaks = sundays,
               date_labels = "%d %b") +
  scale_fill_viridis_c(option = "A",
                       breaks = c(91, 146),
                       labels = c("Less", "More"),
                       direction = -1) +
  theme_minimal() +
  theme(legend.position = "top",
        title = element_text(size = 14),
        panel.grid.major.x = element_line(colour = "white", linetype = "dashed"),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.ontop = TRUE,
        plot.margin = margin(r = 1, unit = "cm")) +
  labs(y = "",
       x = "",
       fill = "",
       title = "Mobility trends for places of residence",
       caption = "Data: google.com/covid19/mobility/") +
  guides(fill = guide_colorbar(barwidth = 10,
                               barheight = .5,
                               label.position = "top",
                               ticks = FALSE)) +
  coord_cartesian(expand = FALSE)


4.6 Aplicar funciones sobre vectores o listas
El paquete purrr contiene un conjunto de funciones avanzadas de programación funcional para trabajar con funciones y vectores. La familia de funciones lapply() conocido de R Basecorresponde a las funciones de map() en este paquete. Una de las mayores ventajas es poder reducir el uso de bucles (for, etc.).

# lista con dos vectores
vec_list <- list(x = 1:10, y = 50:70)

# calculamos el promedio para cada uno
map(vec_list, mean)
## $x
## [1] 5.5
##
## $y
## [1] 60
# podemos cambiar tipo de salida map_* (dbl, chr, lgl, etc.)
map_dbl(vec_list, mean)
##    x    y
##  5.5 60.0
Un ejemplo más complejo. Calculamos el coeficiente de correlación entre la movilidad residencial y la de los parques en todos los países europeos. Para obtener un resumen tidy de un modelo o un test usamos la función tidy() del paquete broom.

library(broom) # tidy outputs

# función adaptada
cor_test <- function(x, formula) {

df <- cor.test(as.formula(formula), data = x) %>% tidy()

return(df)

}

# preparamos los datos
europe_reg <- filter(google_mobility,
                           !is.na(sub_region_1),
                           !is.na(residential_percent_change_from_baseline)) %>%
                     left_join(wld, by = c("country_region_code"="iso_a2")) %>%
                     filter(subregion %in% c("Northern Europe",
                                         "Southern Europe",
                                          "Western Europe",
                                          "Eastern Europe"))
# aplicamos la función a cada país creando una lista
europe_reg %>%
  split(.$country_region_code) %>%
  map(cor_test, formula = "~ residential_percent_change_from_baseline + parks_percent_change_from_baseline")
## $AT
## # A tibble: 1 x 8
##   estimate statistic  p.value parameter conf.low conf.high method    alternative
##      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>
## 1   -0.360     -12.3 2.68e-32      1009   -0.413    -0.305 Pearson'~ two.sided
##
## $BE
## # A tibble: 1 x 8
##   estimate statistic  p.value parameter conf.low conf.high method    alternative
##      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>
## 1   -0.312     -6.06  3.67e-9       340   -0.405    -0.213 Pearson'~ two.sided
##
## $BG
## # A tibble: 1 x 8
##   estimate statistic   p.value parameter conf.low conf.high method   alternative
##      <dbl>     <dbl>     <dbl>     <int>    <dbl>     <dbl> <chr>    <chr>
## 1   -0.677     -37.8 1.47e-227      1694   -0.702    -0.650 Pearson~ two.sided
##
## $CH
## # A tibble: 1 x 8
##   estimate statistic p.value parameter conf.low conf.high method     alternative
##      <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl> <chr>      <chr>
## 1  -0.0786     -2.91 0.00370      1360   -0.131   -0.0256 Pearson's~ two.sided
##
## $CZ
## # A tibble: 1 x 8
##   estimate statistic  p.value parameter conf.low conf.high method    alternative
##      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>
## 1  -0.0837     -3.35 0.000824      1593   -0.132   -0.0347 Pearson'~ two.sided
##
## $DE
## # A tibble: 1 x 8
##   estimate statistic p.value parameter conf.low conf.high method     alternative
##      <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl> <chr>      <chr>
## 1  0.00239     0.102   0.919      1814  -0.0436    0.0484 Pearson's~ two.sided
##
## $DK
## # A tibble: 1 x 8
##   estimate statistic  p.value parameter conf.low conf.high method    alternative
##      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>
## 1    0.237      5.81  1.04e-8       567    0.158     0.313 Pearson'~ two.sided
##
## $EE
## # A tibble: 1 x 8
##   estimate statistic p.value parameter conf.low conf.high method     alternative
##      <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl> <chr>      <chr>
## 1   -0.235     -2.88 0.00462       142   -0.384   -0.0740 Pearson's~ two.sided
##
## $ES
## # A tibble: 1 x 8
##   estimate statistic p.value parameter conf.low conf.high method     alternative
##      <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl> <chr>      <chr>
## 1   -0.825     -65.4       0      2005   -0.839    -0.811 Pearson's~ two.sided
##
## $FI
## # A tibble: 1 x 8
##   estimate statistic p.value parameter conf.low conf.high method     alternative
##      <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl> <chr>      <chr>
## 1   0.0427      1.42   0.155      1106  -0.0162     0.101 Pearson's~ two.sided
##
## $FR
## # A tibble: 1 x 8
##   estimate statistic   p.value parameter conf.low conf.high method   alternative
##      <dbl>     <dbl>     <dbl>     <int>    <dbl>     <dbl> <chr>    <chr>
## 1   -0.698     -37.4 3.29e-216      1474   -0.723    -0.671 Pearson~ two.sided
##
## $GB
## # A tibble: 1 x 8
##   estimate statistic  p.value parameter conf.low conf.high method    alternative
##      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>
## 1   -0.105     -11.0 9.19e-28     10712   -0.124   -0.0865 Pearson'~ two.sided
##
## $GR
## # A tibble: 1 x 8
##   estimate statistic   p.value parameter conf.low conf.high method   alternative
##      <dbl>     <dbl>     <dbl>     <int>    <dbl>     <dbl> <chr>    <chr>
## 1   -0.692     -27.0 1.03e-114       796   -0.726    -0.654 Pearson~ two.sided
##
## $HR
## # A tibble: 1 x 8
##   estimate statistic  p.value parameter conf.low conf.high method    alternative
##      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>
## 1   -0.579     -21.9 9.32e-87       954   -0.620    -0.536 Pearson'~ two.sided
##
## $HU
## # A tibble: 1 x 8
##   estimate statistic  p.value parameter conf.low conf.high method    alternative
##      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>
## 1   -0.342     -15.6 6.71e-52      1843   -0.382    -0.301 Pearson'~ two.sided
##
## $IE
## # A tibble: 1 x 8
##   estimate statistic  p.value parameter conf.low conf.high method    alternative
##      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>
## 1   -0.222     -8.45 7.49e-17      1378   -0.271    -0.171 Pearson'~ two.sided
##
## $IT
## # A tibble: 1 x 8
##   estimate statistic p.value parameter conf.low conf.high method     alternative
##      <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl> <chr>      <chr>
## 1   -0.831     -71.0       0      2250   -0.844    -0.818 Pearson's~ two.sided
##
## $LT
## # A tibble: 1 x 8
##   estimate statistic  p.value parameter conf.low conf.high method    alternative
##      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>
## 1   -0.204     -5.45  7.17e-8       686   -0.274    -0.131 Pearson'~ two.sided
##
## $LV
## # A tibble: 1 x 8
##   estimate statistic  p.value parameter conf.low conf.high method    alternative
##      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>
## 1   -0.544     -6.87 3.84e-10       112   -0.662    -0.401 Pearson'~ two.sided
##
## $NL
## # A tibble: 1 x 8
##   estimate statistic  p.value parameter conf.low conf.high method    alternative
##      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>
## 1    0.143      5.31  1.25e-7      1356   0.0903     0.195 Pearson'~ two.sided
##
## $NO
## # A tibble: 1 x 8
##   estimate statistic p.value parameter conf.low conf.high method     alternative
##      <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl> <chr>      <chr>
## 1   0.0483      1.69  0.0911      1221 -0.00774     0.104 Pearson's~ two.sided
##
## $PL
## # A tibble: 1 x 8
##   estimate statistic   p.value parameter conf.low conf.high method   alternative
##      <dbl>     <dbl>     <dbl>     <int>    <dbl>     <dbl> <chr>    <chr>
## 1   -0.531     -26.7 6.08e-133      1815   -0.564    -0.498 Pearson~ two.sided
##
## $PT
## # A tibble: 1 x 8
##   estimate statistic   p.value parameter conf.low conf.high method   alternative
##      <dbl>     <dbl>     <dbl>     <int>    <dbl>     <dbl> <chr>    <chr>
## 1   -0.729     -46.9 2.12e-321      1938   -0.749    -0.707 Pearson~ two.sided
##
## $RO
## # A tibble: 1 x 8
##   estimate statistic p.value parameter conf.low conf.high method     alternative
##      <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl> <chr>      <chr>
## 1   -0.640     -56.0       0      4517   -0.657    -0.623 Pearson's~ two.sided
##
## $SE
## # A tibble: 1 x 8
##   estimate statistic  p.value parameter conf.low conf.high method    alternative
##      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>
## 1    0.106      3.93  9.09e-5      1367   0.0529     0.158 Pearson'~ two.sided
##
## $SI
## # A tibble: 1 x 8
##   estimate statistic  p.value parameter conf.low conf.high method    alternative
##      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>
## 1   -0.627     -11.4 1.98e-23       200   -0.704    -0.535 Pearson'~ two.sided
##
## $SK
## # A tibble: 1 x 8
##   estimate statistic  p.value parameter conf.low conf.high method    alternative
##      <dbl>     <dbl>    <dbl>     <int>    <dbl>     <dbl> <chr>     <chr>
## 1   -0.196     -5.70  1.65e-8       810   -0.262    -0.129 Pearson'~ two.sided
Como ya hemos visto anteriormente, existen subfunciones de map_* para obtener en lugar de una lista un objeto de otra clase, aquí de data.frame.

cor_mobility <- europe_reg %>%
                  split(.$country_region_code) %>%
                     map_df(cor_test,
                            formula = "~ residential_percent_change_from_baseline + parks_percent_change_from_baseline",
                            .id = "country_code")

arrange(cor_mobility, estimate)
## # A tibble: 27 x 9
##    country_code estimate statistic   p.value parameter conf.low conf.high method
##    <chr>           <dbl>     <dbl>     <dbl>     <int>    <dbl>     <dbl> <chr>
##  1 IT             -0.831    -71.0  0.             2250   -0.844    -0.818 Pears~
##  2 ES             -0.825    -65.4  0.             2005   -0.839    -0.811 Pears~
##  3 PT             -0.729    -46.9  2.12e-321      1938   -0.749    -0.707 Pears~
##  4 FR             -0.698    -37.4  3.29e-216      1474   -0.723    -0.671 Pears~
##  5 GR             -0.692    -27.0  1.03e-114       796   -0.726    -0.654 Pears~
##  6 BG             -0.677    -37.8  1.47e-227      1694   -0.702    -0.650 Pears~
##  7 RO             -0.640    -56.0  0.             4517   -0.657    -0.623 Pears~
##  8 SI             -0.627    -11.4  1.98e- 23       200   -0.704    -0.535 Pears~
##  9 HR             -0.579    -21.9  9.32e- 87       954   -0.620    -0.536 Pears~
## 10 LV             -0.544     -6.87 3.84e- 10       112   -0.662    -0.401 Pears~
## # ... with 17 more rows, and 1 more variable: alternative <chr>
Otros ejemplos prácticos aquí en este post or este otro. Podemos encontrar más detalles en el Cheat-Sheet de purrr.
* COMMENT Local Variables                          :ARCHIVE:

* Footnotes

[fn:1]
# Local Variables:
# eval: (org-hugo-auto-export-mode)
# org-hugo-auto-export-on-save: t
# End: