layout: true border-top: 0px solid #23373B; background-image: url("pictures/wvu_background.png") background-position: 50% 50%; background-size: 100% 100% .white[ .right[ # Introduction to <br> Spatial Data Analysis .large[Casey Jelsema | BIOS 604 | 28 Aug, 2019] ] ] --- --- layout: false # What is Spatial Data Analysis? -- What is Data Analysis? > Data analysis is a process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusions and supporting decision-making. > - Wikipedia contributors [Wik19] -- What is *data*? ??? Prompt for feedback / responses --- # What is Spatial Data? .pull-left[ ] .pull-right[ What makes data spatial? ] --- # What is Spatial Data? .pull-left[ ![](intro_spatial_files/figure-html/maps_01a-1.png)<!-- --> ] .pull-right[ What makes data spatial? - Recording the location? - Plotting it on a map? - Something else? ] ??? - Ask these questions honestly. - Solicit feedback. - Comment that the top graph shows some spatial dependence, but bottom graph does not. --- # What is Spatial Data? .pull-left[ ![](intro_spatial_files/figure-html/maps_01b-1.png)<!-- --> ] .pull-right[ What makes data spatial? - Recording the location? - Plotting it on a map? - Something else? Key idea of spatial data: > The closer two locations are in space, the more highly correlated observations taken at those locations will be. ] ??? - We can relate this to another concept we all understand. - Family trees! --- # Family Trees .center[ <img src="pictures/brit_family_tree01.jpeg" width="960" /> ] ??? - The closer you are in generations, the more related you will be to another individual - Queen to her kids, to their kids, to THEIR kids - Prince whats-his-face to his siblings, to his uncle, cousin, second cousins, etc. - Similarly, with spatial correlation we mean that --- # Spatial Correlation -- .pull-left[ Mason and Kanawha are more highly correlated ... ![](intro_spatial_files/figure-html/map_spat_cor02-1.png)<!-- --> ] -- .pull-right[ <br> ... compared to Mason and Monongalia ![](intro_spatial_files/figure-html/map_spat_cor01-1.png)<!-- --> ] ??? - END: So a spatial data analysis is --- # Spatial Data Analysis The process of: - Inspecting, cleaning, and modeling geo-located data - With the goal of discovering useful information or making conclusions - While accounting for the fact that closer locations are more highly correlated. More than just modeling data that *could* be mapped. ??? - For example, we could run a regression on the relationship between gun homicide and gun ownership rates --- layout: true border-top: 0px solid #23373B; background-image: url("pictures/gun_regression.png") background-position: 50% 50%; background-size: 100% 100% ??? - That's not a **spatial** analysis, because we didn't actually use the location of the states. - They just plotted gun deaths by gun ownership and ran a simple regression. - Doesn't recognize that West Virginia and Boring Virginia are right next to each other. - And there may not be anything wrong with that! Maybe there isn't much spatial correlation in these data. - How do we assess spatial correlation? --- --- layout: false # Spatial Data Association .pull-left[ ![](intro_spatial_files/figure-html/maps_01c-1.png)<!-- --> ] -- .pull-right[ Two statistics Moran's I: `$$I = \dfrac{ n \sum_{i}\sum_{j} h_{ij}\left( y_{i} - \bar{y}\right)\left( y_{j} - \bar{y}\right) }{ \left( \sum_{i\ne j}h_{ij}\right)\sum_{i}\left( y_{i} - \bar{y}\right)^{2} }$$` Geary's C: `$$C = \dfrac{ (n-1) \sum_{i}\sum_{j} h_{ij}\left( y_{i} - y_{j}\right)^{2} }{ 2\left( \sum_{i\ne j}h_{ij}\right)\sum_{i}\left( y_{i} - \bar{y}\right)^{2} }$$` ] ??? How to measure spatial correlation? --- # Areal Data Association .pull-left[ ![](intro_spatial_files/figure-html/maps_01d-1.png)<!-- --> ] .pull-right[ Average `\(PM_{2.5}\)` - I = 0.7347 - C = 0.1596 Flu vaccination: - I = 0.2337 - C = 0.7369 Complete Spatial Randomness (CSR) ] ??? - So what about beyond association? - Model spatial data: spatial smoothing or spatial prediction --- layout:false # The General Linear Model Recall the linear regression model `$$y = \beta_{0} + \beta_{1}x + \varepsilon$$` -- What are the typical assumptions for a regression analysis? -- The random errors ( `\(\varepsilon\)` ) are: - Normally distributed - Have constant variance - Are independent of each other -- `\(\mbox{var}(\varepsilon_{i})=\sigma^{2}\)` and `\(\mbox{cov}(\varepsilon_{i}, \varepsilon_{j})=0\)` -- No so for a spatial analysis! ??? - You don't have to know the details - But getting the big picture here is important to understand what's going on. --- # The General Linear Model Alternative expression for the GLM: `$$y \sim \mathcal{N}\left( X\beta , \sigma^{2}\mathbb{I} \right)$$` -- > The outcome `\(y\)` follows a Normal distribution with a mean of `\(X\beta\)` and a covariance matrix that is diagonal multiplied by a constant. -- `\(X\beta\)`? Just a compact way to write `\(\beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + \ldots + \beta_{p}x_{p}\)` Covariance matrix? A big array that holds the variance for each point, and the covariance between points. Simple example: Three counties ??? - Linear regression is a special case of what's called a general linear model - One way to express the GLM is like this --- # A Covariance Matrix Consider the example of the flu vaccination rate for all WV counties .pull-left[ The outcome is: ![](intro_spatial_files/figure-html/make_yvec -1.png)<!-- --> ] -- .pull-right[ And the covariance matrix is: ![](intro_spatial_files/figure-html/make_cov1 -1.png)<!-- --> ] --- # A Covariance Matrix .pull-left[ We can rewrite this as: ![](intro_spatial_files/figure-html/make_cov2a -1.png)<!-- --> ] .pull-right[ Where ... ] --- # A Covariance Matrix .pull-left[ We can rewrite this as: ![](intro_spatial_files/figure-html/make_cov2b -1.png)<!-- --> ] .pull-right[ Where ... - .red0[This] is the variance for Monongalia county ] --- # A Covariance Matrix .pull-left[ We can rewrite this as: ![](intro_spatial_files/figure-html/make_cov2c -1.png)<!-- --> ] .pull-right[ Where ... - .red0[This] is the variance for Monongalia county - And .blue0[this] is the *covariance* between Monongalia and Logan counties ] --- # A Covariance Matrix .pull-left[ We can rewrite this as: ![](intro_spatial_files/figure-html/make_cov2d -1.png)<!-- --> ] .pull-right[ Where ... - .red0[This] is the variance for Monongalia county - And .blue0[this] is the *covariance* between Monongalia and Logan counties In a spatial model this becomes ... ![](intro_spatial_files/figure-html/make_cov4a -1.png)<!-- --> ] --- # A Covariance Matrix .pull-left[ We can rewrite this as: ![](intro_spatial_files/figure-html/make_cov2e -1.png)<!-- --> So the question becomes: > How do we put these spatial covariances into a model? ] .pull-right[ Where ... - .red0[This] is the variance for Monongalia county - And .blue0[this] is the *covariance* between Monongalia and Logan counties In a spatial model this becomes ... ![](intro_spatial_files/figure-html/make_cov4b -1.png)<!-- --> ] ??? The answer depends on what type of spatial data we have. - How we go about incorporating spatial dependence depends on the type of data we're collecting. - What are some types of data? Numeric, Categorical, etc. - What type of analysis would we use if ... - Response is numeric, predictor is numeric - Response is numeric, predictor is categorical - Response is categorical, predictor is numeric - Response is categorical, predictor is categorical - There are also different ways in which data can be spatial --- layout:false # Three Types of Spatial Data .pull-left[ - Areal - Geostatistical - Spatial Point Process ] --- # Three Types of Spatial Data .pull-left[ - **Areal** - Geostatistical - Spatial Point Process ![](intro_spatial_files/figure-html/map_02-1.png)<!-- --> ] ??? - Is like the maps I've been showing so far - WV has 55 counties, we're not able to get another one between, say, Mon and Preston counties. - Sometimes called **polygon** data in one program I use to do analysis and mapping. -- .pull-right[ Areal data: - Finite number of spatial locations - Discrete spatial locations - States, Counties, ZIP codes - Usually an aggregate measure over each location ] --- # Three Types of Spatial Data .pull-left[ - Areal - **Geostatistical** - Spatial Point Process ![](intro_spatial_files/figure-html/map_03-1.png)<!-- --> ] .pull-right[ Geostatistical data: - *Point*-referenced locations - Spatial location is continuous (not random) - Usually a single value at each location ] ??? - The underlying *geometry* is continuous - We could have infinitely many points here - We envision a smooth function that we only measure at certain places, but the thing we're measuring has values everywhere - We just don't have a complete picture of the process --- # Three Types of Spatial Data .pull-left[ - Areal - Geostatistical - **Spatial Point Process** ![](intro_spatial_files/figure-html/map_04b-1.png)<!-- --> ] .pull-right[ Point Process data: - Interested in some *event* - Goal is to identify the rate of occurance - The location is the *outcome* ] ??? - We're not measuring a value - Well, we *could*, but we don't have to - Our outcome is "There is a THING/EVENT at XYZ location" - This was one of the earliest known examples of a spatial analysis, and also an early Epidemiological analysis - Who know's what I'm talking about? --- layout: true border-top: 0px solid #23373B; background-image: url("pictures/john_snow_cholera.jpg") background-position: 50% 50%; background-size: 100% 100% ??? - John Snow mapping Cholera outbreak in 1854 London - So how do we go about incorporating spatial correlation into a statistical model? --- --- layout: false # Modeling Areal Data Much Epidimiological data is areal in nature -- May be something that only makes sense at an "area" level - Number of cases of influenza per state - State expenditures on Medicare or Medicaid - Rate of preventable hospital stays by county -- Sometimes an aggregation of spatial point process data - Number of opiod overdoses for each county - Average distance to grocery store - Rate of uninsured adults -- Key idea: > Spatial dependence for areal data is modeled through the neighborhood matrix ??? - Incidence of flu. What would be the location of interest? => Where fly was contracted. - May be reported at hospital, or home address. Is that really "the" location? - Most sensible 'location' is probably ZIP code, county, state, ... - So how does spatial dependence come into it? --- # Neighborhood Matrix A matrix that has a value of 1 when two locations are neighbors, and a 0 otherwise. -- Consider north-central West Virginia ("Mountaineer Country") .center[ <img src="intro_spatial_files/figure-html/ncwv01-1.png" width="350" height="350" /> ] --- # Neighborhood Matrix: NCWV .pull-left[ A map of the *spatial domain*: ![](intro_spatial_files/figure-html/map_ncwv01-1.png)<!-- --> ] -- .pull-right[ And the neighborhood matrix is: <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> Mon </th> <th style="text-align:right;"> Pre </th> <th style="text-align:right;"> Tay </th> <th style="text-align:right;"> Mar </th> <th style="text-align:right;"> Har </th> <th style="text-align:right;"> Dod </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Mon </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Pre </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Tay </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Mar </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> Har </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:left;"> Dod </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> </tr> </tbody> </table> ] ??? - So what do these values mean? --- # Neighborhood Matrix: Monongalia Monongalia county ... ![](intro_spatial_files/figure-html/map_ncwv02a-1.png)<!-- --> --- # Neighborhood Matrix: Monongalia Monongalia county ... and its neighbors ![](intro_spatial_files/figure-html/map_ncwv02b-1.png)<!-- --> --- # Neighborhood Matrix: Preston Preston county ... ![](intro_spatial_files/figure-html/map_ncwv03a-1.png)<!-- --> --- # Neighborhood Matrix: Preston Preston county ... and its neighbors ![](intro_spatial_files/figure-html/map_ncwv03b-1.png)<!-- --> --- # Neighborhood Matrix: Doddridge Doddridge county ... ![](intro_spatial_files/figure-html/map_ncwv04a-1.png)<!-- --> --- # Neighborhood Matrix: Doddridge Doddridge county ... and its neighbors ![](intro_spatial_files/figure-html/map_ncwv04b-1.png)<!-- --> --- # Neighborhood Matrix: Variants -- Instead of "Neighbor = 1" and "Non-neighbor = 0", we could ... - Standardize the values (spatial weights) of the neighborhood matrix for **each** county -- - Standardize the spatial weights across **all** counties -- - Use alternative values for Neighbors - Inverse distance between county centroid (geographical center) or population mean centroid (population center) ??? - So how do we bring this into a spatial analysis? --- # Areal Spatial Model Recall the linear regression model (again!) `$$Y \sim \mathcal{N}\left( X\beta , \sigma^{2}\mathbb{I} \right)$$` If the data are areal spatial data, we update this model using the neighborhood matrix -- Spatial CAR model: `$$Y_{i} \sim \mathcal{N}\left( \gamma_{i}, \sigma^{2} \right)$$` where `$$\gamma_{i}|\mathbb{\gamma}_{-i} \sim \mathcal{N}\left( \mu + \sum h_{ij}\left( \gamma_{j}-\mu \right), \tau^{2} \right)$$` -- That looks complicated -- It is complicated ??? - That looks complicated ... but it's just how we express the relatively intuitive idea in math - Just like you may not have known all of the mathematical expressions when running regression, you don't need to know them all for running a spatial model. - Maybe a picture will make more sense. --- # Visualizing the Spatial CAR .center[ ![](intro_spatial_files/figure-html/map_ncwv05a-1.png)<!-- --> ] --- # Visualizing the Spatial CAR .center[ ![](intro_spatial_files/figure-html/map_ncwv05b-1.png)<!-- --> ] -- `$$\gamma_{i}|\mathbb{\gamma}_{-i} \sim \mathcal{N}\left( \mu + \sum h_{ij}\left( \gamma_{j}-\mu \right), \tau^{2} \right)$$` --- # Visualizing the Spatial CAR .center[ ![](intro_spatial_files/figure-html/map_ncwv05d-1.png)<!-- --> ] `$$\gamma_{i}|\mathbb{\gamma}_{-i} \sim \mathcal{N}\left( \mu + \sum h_{ij}\left( \gamma_{j}-\mu \right), \tau^{2} \right)$$` ??? - This doesn't mean that Mon county is not AT ALL related to Harrison or Doddridge - There's just a degree of separation - Again, think back to the family trees --- # Family Trees .center[ <img src="pictures/brit_family_tree01.jpeg" width="960" /> ] ??? - The closer you are in generations, the more related you will be to another individual - Queen to her kids, to their kids, to THEIR kids - Prince whats-his-face to his siblings, to his uncle, cousin, second cousins, etc. - Let's say that neighbors are either siblings, or up/down a generation, one "step" can put them all at the same person. --- # Family Trees .center[ <img src="pictures/brit_family_tree02.jpeg" width="960" /> ] ??? - So the Queen's grandchildren - Each pair of siblings is neighbors, and they're *related* to each other but are not neighbors with their cousins, need to go up TWO generations to get to the same person. --- # Family Trees .center[ <img src="pictures/brit_family_tree03.jpeg" width="960" /> ] ??? - Then the kids of Princes William and Harry. William's kids are neighbors, and are related to Harry's kid to the same degree that William and Harry are related to their cousins. - The four kids of William and Harry are also related to --- # Family Trees .center[ <img src="pictures/brit_family_tree04.jpeg" width="960" /> ] ??? - The four kids of William and Harry are also related to their second cousins --- # Family Trees .center[ <img src="pictures/brit_family_tree05.jpeg" width="960" /> ] ??? - But the relationship is more distant, they need to go up THREE generations to get to the same person. --- # Areal Spatial Model Another (still complicated) way to express this is: `$$Y \sim \mathcal{N}\left( X\beta , \tau^{2}\left(\mathbb{I} - \phi\textbf{H}\right)^{-1} \right)$$` -- Here: - `\(\tau^{2}\)` is a non-saptial variance - `\(\phi\)` is measure of spatial dependence - `\(\textbf{H}\)` is the neighborhood matrix Different approach to fitting the same model. --- # Example Analysis Outcome: Percent of population with flu vaccination .pull-left[ ![](intro_spatial_files/figure-html/areal01-1.png)<!-- --> ] .pull-right[ ![](intro_spatial_files/figure-html/areal02-1.png)<!-- --> ] --- # Modeling Geostatistical Data More difficult because there are infinitely many (possible) locations. - No good version of the neighborhood matrix. -- Solution: Impose a more simplified covariance model `$$\mbox{cov}\left( Y(s_{i}), Y(s_{j}) \right) = C\left( d_{ij} \right)$$` Covariance is a function of only the *distance* between two points. --- # Modeling Geostatistical Data Examples of `\(C(d_{ij})\)` include ... - Exponential Covariance Model `$$C(d_{ij}) = \left\{ \begin{array}{ll} \sigma^{2} e^{-\phi d_{ij}} & \mbox{if } d_{ij} > 0\\ \tau^{2} + \sigma^{2} & \mbox{if } d_{ij} = 0\end{array}\right.$$` -- - Matérn Covariance Model `$$C(d_{ij}) = \left\{ \begin{array}{ll} \dfrac{\sigma^{2}}{2^{\nu-1}\Gamma(\nu)}\left(\phi d_{ij}\right)^{\nu}K_{\nu}\left(\phi d_{ij}\right) & \mbox{if } d_{ij} > 0\\ \tau^{2} + \sigma^{2} & \mbox{if } d_{ij} = 0\end{array}\right.$$` Often a related concept, the *semivariogram* is what is actually modeled. ??? - The Exponential is a simpler model - The Matern is more flexible, but is more complicated to use - Both have several parameters, but there are three main concepts that get considered with geostatistical data --- # Spatial Covariance Functions .left-column[ - Sill - Nugget - Range ] .right-column[ ![](intro_spatial_files/figure-html/spatial_covar_plot01-1.png)<!-- --> ] --- # Spatial Covariance Functions One approach is to estimate the covariance at a sequence of distances ... .center[ ![](intro_spatial_files/figure-html/spatial_covar_plot02a-1.png)<!-- --> ] --- # Spatial Covariance Functions ... and then fit the covariance model to these data .center[ ![](intro_spatial_files/figure-html/spatial_covar_plot02b-1.png)<!-- --> ] --- # Modeling Geostatistical Data What happens next? -- Define locations where you want to make spatial predictions. - Observed locations: Data - Unobserved locations: Predictions -- Use known data to make predictions based on distance to the observed data. --- # Modeling Geostatistical Data .pull-left[ Use these data ... ![](intro_spatial_files/figure-html/model_geostat01a-1.png)<!-- --> ] .pull-right[ ... to make predictions at these locations. ![](intro_spatial_files/figure-html/model_geostat01b-1.png)<!-- --> ] --- # Modeling Geostatistical Data When we do so, we obtain this map .pull-left[ ![](intro_spatial_files/figure-html/map_geos01 -1.png)<!-- --> ] .pull-right[ ![](intro_spatial_files/figure-html/map_geos02 -1.png)<!-- --> ] --- # Too Much Data To produce that map, I didn't use the method I described Why not? - Problem: Number of observed locations is n = 12102 -- - Reason: Need to "invert" covariance matrix ... computationally expensive! -- - Solution: Use a *reduced rank* spatial model - Model through a smaller set of spatial locations (called *knots*) across the domain --- # Example: Too Much Data .center[ ![](intro_spatial_files/figure-html/map_geos03 -1.png)<!-- --> ] ??? - Total number of knot locations: r = 87 --- # Too Few Data In some contexts, to few data is a problem. Recall our our WV maps -- .pull-left[ ![](intro_spatial_files/figure-html/few_data_map-1.png)<!-- --> ] -- .pull-right[ Some concerns: - What if the outcome is rare? - What if there is only once instance in a county? - What if the outcome has *social* or *legal* implications? ] ??? - There may be times when it is irresponsible, inappropriate, or unethical to report the data in certain ways. - Want to avoid being able to individually identify people. - If rates of outcome are low ... - Might have unstable rates (is the reported rate even worth reporting?). - Like a 51-49 "lead" in presidential race - Or a 50% increase in risk of cancer due to certain behavior. Raw rate being: 0.5% to 0.75%. - Aggregating to larger areas (zip code to county, county to state, etc) may give more reliable estimates. - But there may be too coarse, or have "edge effects". - "Geographical epidemiology, spatial analysis and geographical information systems: a multidisciplinary glossary" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2465628/ - Maybe you have more data, but it's on a different spatial grid, how to resolve? --- # Different Areal Grids Current research project: GAIL -- Want: Produce maps and estimates of rate of some outcomes in WV, on a ZIP code level. Have: - Data on ZIP code level - Data on PO Box ZIP code level Problem: - Outcome is rare - Some regions in WV are very rural - No inherrent match between PO Box and Standard ZIP codes. -- One Approach: - Randomly assign cases from one areal grid to the other - Introduce some noise, but utilize more data ??? - GAIL = Geo-assignment of Irregular Locations - Sometimes people combine by aggregating to the county level. Not desirable: Loss of granularity. - Other times people throw out the PO Box ZIP codes. Not desirable: Loss of data, could be critical. --- # Fundamentals of Cartography Let's talk maps -- We all know the Earth is round ??? - Like this picture --- layout: true border-top: 0px solid #23373B; background-image: url(projections/flat_earth.png) background-position: 50% 50%; background-size: 100% 100% ??? - We all know the Earth isn't flat, right? --- --- layout: true border-top: 0px solid #23373B; background-image: url(projections/flat_earth_dinos.png) background-position: 50% 50%; background-size: 100% 100% ??? - Though that would explain the extinction of dionsaurs ... --- --- layout: true border-top: 0px solid #23373B; background-image: url(projections/old_globe.jpg) background-position: 50% 50%; background-size: 100% 100% ??? - But really, we know it's a globe - At least I hope you know it's a globe ... --- --- layout: true border-top: 0px solid #23373B; background-image: url(projections/ptolemy01.jpg) background-position: 50% 50%; background-size: 100% 100% ??? - We've known that for quite some time --- --- layout: true border-top: 0px solid #23373B; background-image: url(projections/ptolemy01.jpg) background-position: 50% 50%; background-size: 100% 100% ??? - We've known that for quite some time --- layout: false # Fundamentals of Cartography Let's talk maps We all know the Earth is ~~round~~ globular -- But **maps** are flat -- Need to *project* round object onto flat surface -- Creates some sort of distortion --- layout: true border-top: 0px solid #23373B; background-image: url(projections/ancient_map_01.jpg) background-position: 50% 50%; background-size: 100% 100% ??? - One retro map projection, not actually intended to be accurate --- --- layout: true border-top: 0px solid #23373B; background-image: url(projections/ancient_map_02.jpg) background-position: 50% 50%; background-size: 100% 100% ??? - Another old map projection, circa 200 BC --- --- layout: true border-top: 0px solid #23373B; background-image: url(projections/ancient_map_02b.jpg) background-position: 50% 50%; background-size: 100% 100% ??? - Ptolemy's map, circa 150 AD --- --- layout: true border-top: 0px solid #23373B; background-image: url(projections/ancient_map_03.jpg) background-position: 50% 50%; background-size: 100% 100% ??? - Portion of Roman road network map, with Rome as center of the world - circa 300 AD - Emphasizes particular features. Not *wrong* --- layout: true border-top: 0px solid #23373B; background-image: url(projections/anglo_saxon_map.jpg) background-position: 50% 50%; background-size: 100% 100% ??? - This one is interesting - Do you recognize any features? --- --- layout: false # Map Projections Modern maps attempt to accurately represent the world Need to *project* the globe onto flat surface -- What features to preserve? - Shape? - Area? - Distance? - Direction / Angles? -- Huh? ??? - Shape: Conformity, what a place would look like from directly above it - Area: Size, how much of the map something take up - Distance: How far apart two points on the surface are - Direction/Angles: Which way is the shortest distance between two points. - For instance, an airplane flying from New York to Beijing will go near the North Pole, not so much over the Pacific Ocean. - So a "straight line" might not look straight on a map --- # Map Projections Consider two sets of locations: - Set 1: Latitude 0 `\(^{\circ}\)` , Longitudes 50 `\(^{\circ}\)` E and 50 `\(^{\circ}\)` W - Set 2: Latitude 42 `\(^{\circ}\)` , Longitudes 50 `\(^{\circ}\)` E and 50 `\(^{\circ}\)` W -- .pull-left[ ![](intro_spatial_files/figure-html/move_coords -1.png)<!-- --> ] -- .pull-right[ Same distance! ] --- # But on a globe ... .center[ ![](intro_spatial_files/figure-html/move_coords02 -1.png)<!-- --> ] ??? - We can see here the distances aren't the same! --- # Map Projections Choice of projection can retain various attributes - Should shapes be the same? - Should areas be the same? - Should distances be the same? - Should angles be the same? > Some attributes **will** be distorted, the map-maker must decide which attributes to portray accurately. ??? - For example, this next map shows two different projections - Both have been relatively common (still are in some respects) --- layout: true border-top: 0px solid #23373B; background-image: url("projections/mercator_vs_peters.jpg") background-position: 50% 50%; background-size: 100% 100% ??? - The green is the Gall-Peters projection. The black is the Mercator projection - Both are distorted, but in different ways - Size of Greenland: 836,300 sq mi - Size of Africa: 11,730,000 sq mi (14x the size!) - Size of South America: 6,890,000 sq mi ( 8.24x the size!) - Size of Brazil: 3,287,956 sq mi ( 3.9x the size!) --- --- layout: true border-top: 0px solid #23373B; background-image: url("projections/mercator_move_usa.png") background-position: 50% 50%; background-size: 100% 100% ??? - This illustrates the problem with the Mercator projection --- --- layout: true border-top: 0px solid #23373B; background-image: url("projections/xkcd_projections.png") background-position: 50% 50%; background-size: 100% 100% ??? - There are a lot of choices --- --- layout: true border-top: 0px solid #23373B; background-image: url("projections/projection_rpg.png") background-position: 50% 50%; background-size: 100% 100% ??? - And there is some debate among cartographers! --- layout: false --- # Calculating Distances How to calculate distance between these two points? -- Euclidean plane - Use the Pythagorean theorem! --- # Calculating Distances .center[ ![](intro_spatial_files/figure-html/calc_dist_01a -1.png)<!-- --> ] --- # Calculating Distances .center[ ![](intro_spatial_files/figure-html/calc_dist_01b -1.png)<!-- --> ] ??? - That only works because we can *project* vertically or horizontally without distorting distances - Not so on a globe! We already saw an example of that --- # Calculating Distances .center[ ![](intro_spatial_files/figure-html/move_coords02b -1.png)<!-- --> ] ??? - Both lines have the same x-coordinates, only different y-coordinates. - Lower line is longer distance. - Can think of an extreme example: Distance to walk around the Earth - On equator it's ... I don't know. Long. - If you walk in a circle that's 2 feet away from the south pole, though ... - Submarines sometimes do this as a kind of joke. - So, calculating distances on a globe, how do we do it? --- # Calculating Distances How to calculate distance between these two points? Euclidean plane - Use the Pythagorean theorem! On a globe? - Compute *geodesic* distance - Need to use *spherical geometry* - Polar coordinates (using radius and angle) ??? - Uses something called "great circles" --- # Calculating Distances <img src="pictures/gc_pic.png" width="576" /> ??? - Most software that works with spatial data should autoamtically handle this. - But, like plotting, it's something to be aware of. --- # Some Resources - Geocomputation in R [LNM19] - Book on manipulating, visualizing, and (some) modeling for geographic data - Focused on `R` pacakge `sf` (what I mainly used in this presentation) - https://geocompr.robinlovelace.net/ - Hands on Programming with R [Gro14] - Book for those new to R programming. - https://rstudio-education.github.io/hopr/ - Spatial datasets for practice - I obtained some of the data in this presentation from: http://tagis.dep.wv.gov/home/Downloads - Mostly shapefiles, can be read by `sf` - See more maps here: - https://allthatsinteresting.com/ancient-world-maps --- # References Grolemund, G. (2014). _Hands-On Programming with R_. ISBN: 978-1449359010. URL: [https://rstudio-education.github.io/hopr/](https://rstudio-education.github.io/hopr/). Lovelace, R, J. Nowosad, and J. Muenchow (2019). _Geocomputation with R_. ISBN: 978-1138304512. URL: [https://geocompr.robinlovelace.net/](https://geocompr.robinlovelace.net/). Wikipedia contributors (2019). _Data analysis - Wikipedia, The Free Encyclopedia_. [Online; accessed 8-July-2019].