College of Sciences > Ecology > Yvan Richard > Dispersal choice analysis
SEARCH
MASSEY
 Future students | Current students | Extramural | Alumni | Staff | Programmes | Campuses | Departments | Research
Yvan Richard's page
DISPERSAL CHOICE ANALYSIS
  Main goals
  • Identify the landscape features that facilitate or impede individual movements, if any, based on potentially any dispersal data.

  • Quantify the maximum distance of inhospitable habitat an organism can cross.

  Introduction
Despite the fact that dispersal has been recognised to be a fundamental process to drive the distribution, the viability and the evolution of species, it is still poorly understood and over-simplified in most studies. Although dispersal has often been assumed to be independent of the landscape (Fig. 1, path 1), the features of the landscape can often facilitate or impede individual movements. A common problem in dispersal studies is the fact that records of dispersal paths, obtained for example from radio-tracking, typically consist in series of locations, discrete in time and space. We therefore generally do not know the path followed by an individual between them. Straight lines were generally used to represent such path, but nowadays, least-cost path modelling allows a better representation of individual movements. Indeed, the path inferred between two observed locations can be calculated in order to maximise the crossing of facilitating features and to minimise the crossing of barriers or inhospitable habitats (Fig. 1, path 2). See how cost functions work on the online ArcGIS page. However:
  • How do we know whether individuals use corridors for their dispersal or not? Or more generally, do all substrates provide the same connectivity and if not, how can we identify the features that may facilitate or impede individual movements?

Dispersal figure

Figure 1. Whereas dispersal has often been modelled as a straight movement between patches regardless of the features of the landscape (path 1), using cost distance modelling to represent the preference of individuals to move through specific substrates and avoid others (path 2) may be more realistic in many species and landscapes.

Cost distance modelling relies on the assignment of a resistance value (also called cost or friction value) to each cell of the map that denotes the difficulty of the organism or its reluctance to cross the map cell. However, the resistance values are unfortunately often guessed based on expert judgment or on proxy data. Moreover, their assignment generally assumes that individual movements are decided on a cell-by-cell basis, and decisions made on a larger scale such as gap-crossing ability are ignored.

Ignoring the spatial distribution of these resistance values implies that a path crossing several small stretches of inhospitable habitat (Fig. 2, path 1) can have the same cost as a path crossing fewer but larger stretches (Fig. 2, path 2). However, it is possible that an organism might be willing to use the first path but not the second one. This therefore brings another question:

  • If individuals' movements are impeded by some inhospitable habitats, what maximum distance of these habitats are individuals willing to cross?

Gap crossing figure

Figure 2. Using cost distance modelling while ignoring the gap crossing ability of species often leads to the fact that paths 1 and 2 are equally probable, which is often unrealistic.

I propose here an approach I developed to answer these questions.


   Methods

To illustrate the approach, let's consider an organism living in mature forests. We want to test the hypothesis that this species is reluctant to cross pastures and prefers using woody vegetation to disperse. After fitting some individuals with radio-transmitters, we recorded their location every day during their dispersal (daily dispersal steps). Our dataset at this point only consists of a table containing the radio-tagged individuals, the date and time of each resighting and the corresponding locations where the individuals were found.

If we want to test whether pasture has a low degree of connectivity compared to woody vegetation, we can assign each cell of the map representing pasture with a high resistance value, i.e. 10, compared to those representing woody vegetation with a value of 1, meaning that an individual is 10 times more likely to cross a cell of woody vegetation than a cell of pasture. Most GIS software can then calculate the cost distance between two consecutive recorded locations, which indicates the likelihood of the movement between the two points, assuming that our assignment of resistance values is right. This assignment of the values to each pixel of the map represents our hypothesis about landscape connectivity.

Our connectivity hypothesis can be considered as satisfying only if we can demonstrate that individuals seek to minimise the cost distance of the path they choose, i.e. they prefer moving through low resistance substrates and avoid high resistance ones. For this purpose, we can match each chosen dispersal step (each pair of consecutive locations) to a sample of non-chosen available alternatives of similar length, and calculate the cost distance for these alternative paths also (Fig. 3).

The least-cost paths (and the cost distances that can be calculated in most GIS software package) typically look like this:

Observed and alternative paths figure 

Figure 3. This figure shows a dispersal step (in yellow, starting point in the centre) matched to 10 random alternatives (in red) of similar length, chosen among a wider set of random locations situated in woody vegetation (in black). The least-cost paths are calculated assuming that movements occur most likely in native forest (resistance value of 1), less in pine forest (value of 2), even less in shrubland (3), and impeded by pasture (10).

 

Conditional logit models can then be used to test whether the chosen steps have a lower cost distance on average than their matched alternatives. Such model can be fitted using a Cox proportional hazards model with Breslow ties as their likelihood functions are similar (Chen & Kuo 2001). The data only need to be prepared to be equivalent to survival data, where the individual "dies" at time 1 for the chosen alternative, or "survives" and becomes censored at time 2 for the non-chosen alternatives (Kuhfeld 2001). The theory behind the conditional logit model in an ecological context has been described in Fortin et al. (2005) and for the closely related multinomial logit regression model in Cooper & Millspaugh (1999). 


Here is an example of how the model can be fitted using the great free statistical software R, using the coxph function from the survival library:
R Code library(foreign)  # for function read.dbf
library(survival)  # for function coxph

# import the output table from the GIS
thedata <- read.dbf("C:\\Discrete choice model\\OutputTable.dbf")

# Data preparation

attach(thedata)
t <- 2-CHOICE   # CHOICE=1 for chosen alternative, 0 otherwise
RelCost <- PATHC_CS1/PATHL_CS1   # Standardisation of cost distances by distance

# Fit Cox proportional hazards model
# StartPtID is the index of the starting point of each dispersal step

choice.mod <- coxph(Surv(t,CHOICE) ~ RelCost + strata(StartPtID),
     data=thedata, method="breslow")

# Output results
summary(choice.mod)
 
And for the old-fashioned people, here is an example of how to fit the same model using Proc PHREG in SAS®:

SAS Code /* Import data from Excel spreadsheet */
Proc import out=WORK.DispChoice
  datafile= "C:\Discrete choice model\OutputTable.dbf" /* Output table from the GIS */
  DBMS=DBF replace;
  getdeleted=No;
Run;

/* Data preparation for Proc PHREG*/
Data DispChoice;
  set DispChoice;
  t=2-CHOICE;  /* CHOICE=1 for chosen alternative, 0 otherwise */
  RelCost = PATHC_CS1/PATHL_CS1;  /* Standardisation of cost distances by distance */
  output;
run;

/* Model */
Proc PHREG data=DispChoice;
  strata StartPtID; /* Index of starting location of each dispersal step */
  model t*CHOICE(0)= RelCost /ties=breslow;
run;


Note that this model only has two hierarchical levels, the alternative points within the dispersal steps (defined by StartPtID in this case), i.e. the chosen and non-chosen alternatives, can only be compared from each starting point. It is therefore assumed that actual dispersal steps are independent within and between individuals. A close look (plots etc.) at the data should be sufficient to detect if this assumption is erroneous.

From the fitted model, we get the value and significance of the coefficient β of the utility function U of path i defined as:

   Ui = β.RelativeCosti     (equation 1)

A negative value of β that is statistically significant indicates that individuals prefer to go to locations that are well connected, i.e. for which paths go through corridors and avoid barriers most often.

The probability pi that an individual chooses the path i among j possibilities is:

  pi = exp(Ui) / ∑exp(U1...j)     (equation 2)

If β is significantly negative, individuals' movements follow the map of resistance values we created, which is already satisfying. We can now go further and try different values of resistance, e.g. increase the resistance of pasture to movements and assign different values to different woody vegetation types. If the dataset is kept the same among models, i.e. the random alternatives matched to the chosen dispersal steps are identical across models, then we can use the Akaike Information Criterion (AIC) to compare the models and find the model that fits the data the best.

Likewise, we can incorporate some behavioural traits in the model such as the gap crossing ability of species (see above). This can be achieved by assigning a resistance value to each cell of pasture that changes as a function of its distance to the nearest woody vegetation. A linear or Gompertz function can be used, and several models with various function parameters can be tested as above (Fig. 4).


Plot of various functions to represent gap crossing ability

Figure 4. To test for the gap crossing limitation of species, the resistance value of each cell of pasture can change as a function of its distance to the nearest woody vegetation. Several functions can be used, tested, and their comparison based on models' AIC would indicate which one fits the data the best.


The best model will indicate which hypothesis about landscape connectivity fits the data the best. If the best model indicates a strong gap crossing limitation, we can then calculate the largest gap crossed by the species by first calculating the least-cost paths between consecutive dispersal points based on the cost map of the best model to infer the movement paths, and then by extracting the largest gap crossed by these paths by splitting these paths based on the vegetation cover layer in the GIS.



  Benefits of the method
  • Objective. Whereas the assignment of resistance values used to be quite subjective, the comparison of alternative hypotheses based on AIC allows to test whether a factor is truly influencing species movements.

  • Can be potentially applied to any available dispersal data, without having to rely on expensive research design.

  • Quantitative. The approach allows to quantify the resistance values/functions associated with landscape elements.

  • Applicable to most landscapes and species to test most hypotheses about dispersal.

  • Flexible. Most factors promoting or limiting species movements can be represented in a cost map. Avoidance of roads, buildings or human habitation for example can be modelled by assigning a cost to each map pixel that decreases with distance from the these features, easily calculated in the GIS, and gap crossing ability can be modelled as described above. Furthermore, several factors can be combined in the calculation of the cost map; the resistance value of each pixel would simply be a weighted sum of the different factors.

  • Generalizable. By directly incorporating some behavioural traits in the hypotheses, the functions used to represent these traits, once fitted and validated, are likely to be extrapolated to other landscapes.

  • Extendable. One can also include in the models some characteristics of the dispersing individuals such as sex, size, age or sibling status to analyse or control for variations in dispersal behaviour among individuals. Such models would then be a mixture of conditional and multinomial logit models, but they can be fitted the same way as previously described.

  • The results are easy to present to the public, stakeholders, land and wildlife managers. Cost distances and dispersal frequencies can be represented on maps such as the following ones:


3D illustration of cost distance
Figure 5. Cost distance from a given patch (in centre). The height and colour represent the cost distance from the source. Note how the patch on the right-hand side is unreachable to dispersers because of its isolation.


3D illustration of cost distance modelling
Figure 6. Movement potential from the same given patch (top plateau in red). The height and colour represent the probability of movement from the source, obtained by multiplying the cost distance from the source (Fig. 5) and the probability density function of cost distances achieved by dispersers between their natal territory and their settlement.

  "OK, but how can I do all this?"

Assuming that you already have a shapefile containing some dispersal points, the main task is to calculate the cost distances of the chosen and random locations for each dispersal step for the conditional logit model. Fortunately, I wrote a toolbox in Python for ArcGIS that automates most of the operations. You can download it for free at the end of this page. The toolbox contains two scripts:

  • Dispersal choice analysis - for ArcGIS 92.py
    From a shapefile containing consecutive dispersal locations, this script selects random locations to be matched to the chosen destination for each dispersal step and calculates the cost distance of these points for one of several cost rasters. The output is a table where each row represents a dispersal step, chosen or not, with the associated cost distance (one column for each cost raster). This table can be used directly in R or SAS for fitting conditional logit models using the codes provided above.

  • Dispersal paths and gaps - for ArcGIS 92.py
    From a shapefile containing consecutive dispersal locations, this script calculates the least-cost path between them, given one or several cost rasters, and extracts the maximum distance crossed over a specified substrate (for gap crossing ability - see above).

What do I need to run the scripts?

  • ArcGIS 9.0-9.2 with an ArcInfo license and spatial analyst. This is the expensive part of the process... I hope one day to be able to adapt the scripts to GRASS.

  • A shapefile with dispersal points, ordered by individual and by date/time.

  • A shapefile with relevant random points that will be used as the alternative points. You can create such shapefile using the free extension Hawth's Analysis Tools for ArcGIS.

  • One or more cost rasters calculated in ArcGIS to represent the various hypotheses about landscape connectivity.

  • A lot of patience... Everything in ArcGIS 9.x takes time, but more particularly the calculation of cost distances and least-cost paths. For instance, in the case of 100 recorded dispersal steps, each matched to 10 alternatives, and under 10 different cost sets, the script needs to calculate 10 000 least-cost paths, while a single calculation may take several minutes depending on the length of the step, the map resolution and the computer power. However, the script offers the possibility for the user to be notified by e-mail upon completion of the script (whether successfully or not), and, in case of successful completion, the output table can also be sent.

What these scripts do NOT do:

  • Fitting the actual conditional logit models. This can easily be done using statistical software such as R (free) or SAS (commercial), if you use the codes provided above.

For more details, see the help file provided with the script.



  Download the script for ArcGIS

The script is available both for ArcGIS versions 9.0 to 9.2:

- Download the toolbox for ArcGIS 9.0 and 9.1 (available soon) -

- Download the toolbox for ArcGIS 9.2 -

LICENSE & COPYRIGHT

This software is copyrighted and is the intellectual property of the author. You (users) are granted license to use, install and freely distribute this software without limit. You are expressly forbidden to sell this product, or in any way attempt to make a profit by distributing it (this includes distributing it on a website that sells advertising space).
Although you may distribute this software, I ask that you refer other interested users directly to this web site to ensure they have acquired the latest version.
Implicit in the use of this product is the understanding that:
- No technical support is offered for this product.
- The product is provided AS-IS, without warranty of any kind.
- YOU are responsible for ensuring that the output of this tools is accurate, relevant, consistent, and otherwise error-free.
- The author assumes no responsibility for any suffering you may experience as a result of the use (or misuse) of this software.
- The author does not warrant that this software is bug free.


   References
CHEN Z. & KUO L. (2001). A note on the estimation of the multinomial logit model with random effects. The American Statistician, 55, 89-95.

COOPER A.B. & MILLSPAUGH J.J. (1999). The application of discrete choice models to wildlife resource selection studies. Ecology, 80, 566-575.

FORTIN D., BEYER H.L., BOYCE M.S., SMITH D.W., DUCHESNE T. & MAO J.S. (2005). Wolves influence elk movements: behavior shapes a trophic cascade in Yellowstone National Park. Ecology, 86, 1320-1330.

KUHFELD W.F. (2001). Multinomial logit, discrete choice modeling: an introduction to designing choice experiments, and collecting, processing and analyzing choice data with the SAS system. Sas Technical Report TS-621, SAS Institute, Cary, North Carolina, USA.
 
   Contact Us | About Massey University | Sitemap | Disclaimer | Last updated: 16 July 2009     © Massey University 2008