Dispersal Based Weights Matrix


When performing tests for spatial autocorrelation such as Moran’s I we must create a weights matrix. This defines which geographic subdivisions, such as provinces or census tracts, are neighbors with each other. There are three common methods used to define neighbors: continuity based, distance based, and Knearestneighbor. While these methods work well in most situations, they may not always represent the best way to define neighbors particularly when there is a known process important to the topic of study occurring in space. Here we demonstrate a method to define neighbors for GIS based studies where a dispersal process is at work.
One application in ecology for this method is for the study of flying insects. In this particular example we define neighboring census tracts for MiamiDade county based on the dispersal distance of the Zika transmitting periurban mosquito Aedes albopictus. In this hypothetical study mosquito traps are placed across the city and collections are aggregated by week and census tract. Shapefiles made available through the US Census Bureau's Geography Division. 
Our method is based on the fact that mosquitoes move over time and the abundance of Aedes albopictus might be similar in two tracts where mosquito exchange occurs. These tracts should be considered neighbors because abundance in one can spill over into the other making them spatially dependent.
Based on a study conducted by Marini et al. we estimate that 119 meters is the average distance traveled by Aedes albopictus in a week. For each tract i in our study we’ll determine whether mosquito exchange is possible with tract j, if so we’ll consider them neighbors. We’ll also add weights based on the degree to which exchange is possible between i and j in relation to exchange between i and all j neighbors.
The GIS workflow to accomplish this is quite simple:
Based on a study conducted by Marini et al. we estimate that 119 meters is the average distance traveled by Aedes albopictus in a week. For each tract i in our study we’ll determine whether mosquito exchange is possible with tract j, if so we’ll consider them neighbors. We’ll also add weights based on the degree to which exchange is possible between i and j in relation to exchange between i and all j neighbors.
The GIS workflow to accomplish this is quite simple:
 For each tract generate a buffer of 119 meters.
 Note which other tracts intersect this buffer and consider them neighbors.
 For each neighboring tract we divide the area that intersects the buffer by the total area of the buffer, this is our weight
To start we load in our GIS packages and the census tract shapefile. After converting the coordinate system we subset the shapefile to include just 22 tracts for this example to cut down on run time.
Code Editor
You can see neighboring tracts intersecting the red buffer. The greater the intersecting area, the closer neighbors we consider them.
In this step we create buffers around each tract then take note of which tracts touch or fall within the buffer.
This is a visualization of our weights matrix so far. Each tract is a node and neighborship is represented by lines.
The second step is to add weights based on the relative intersecting area.
We've completed the construction of our weights matrix which we name "aweights". Next we'll assign hypothetical count data to our tracts to see how measures of global and local clustering differ between our custom dispersal based weights matrix and a naive contiguity based weights matrix.
Code Editor
We've intentionally made the count data a little clustered yet still a bit ambiguous. This is where a Moran's I statistic might come in handy!
Now we'll run a global Moran's I statistic on our counts. In the first test we'll use the dispersal based weights matrix. For the second test we'll use a simple contiguity based weights matrix.
When the dispersal based weights matrix is used to defined neighbors we find that there is about a 2% chance that the clustering we detect is a result of chance. If we set our threshold for statistical significance at 95% confidence, we can say that the degree of clustering in our study area is significantly different from random. Simply put: we've detected clustering.
A simple contiguity based weight matrix is used in our second test. Using a 95% confidence threshold we can't say that there is significant spatial clustering in our count data.
These differing results illustrate the importance of how we define neighbors in tests that require it. Using different definitions can result in very different findings. We should try our best to base these definitions on empirical information about the process being studied.
Next we preform a local Moran's I to locate clusters, once again comparing the dispersal based weights matrix to the contiguity based weights matrix.
A simple contiguity based weight matrix is used in our second test. Using a 95% confidence threshold we can't say that there is significant spatial clustering in our count data.
These differing results illustrate the importance of how we define neighbors in tests that require it. Using different definitions can result in very different findings. We should try our best to base these definitions on empirical information about the process being studied.
Next we preform a local Moran's I to locate clusters, once again comparing the dispersal based weights matrix to the contiguity based weights matrix.
We find just one tract to be a significant hotspot for counts of Aedes albopictos. The counts in this tract and the counts in its neighboring tracts are high.
Lets preform the test again using the contiguity based weights matrix.
Interestingly two more tracts are significant hotspots. This differs from the results of our previous test. Considering that the weights matrix used to calculate this local Moran's I statistic has no empirical backing, these results should be treated cautiously. In this weighting scheme tracts are considered neighbors simply if they border each other. No considerations are made for the degree to which mosquitoes in one tract are able to cross over into a neighboring tract.
We've shown here how results can be greatly impacted by our definition of neighbor and one way neighborship can be defined in studies where a dispersal process occurs which impacts a variable of interest.