/* Import a dataset from an Excel file. Note, for PROC IMPORT to work correctly, columns of numeric continuous data must not have blanks in the first 8 rows; otherwise, it will treat those columns as character data. One approach to get around this problem is to create 8 dummy rows at the top of an Excel file, containing numbers, and then delete those rows after import into SAS. */ PROC IMPORT OUT= WORK.parasites DATAFILE= "C:\Users\Jason\Desktop\data.xls" /* this line should be modified to correct the filepath */ DBMS=Excel REPLACE; SHEET="data"; RUN; quit; /* Note: There are many other ways to get your data into Excel. You can use the menu-driven feature called the Import Wizard, under File, Import Data. Or, for simple datasets, you can use the "infile" command within a SAS data step.*/ /* The dataset we just imported from Excel is a modified version of the dataset used for an analysis published by Hoeksema & Forde (2008). The goal of the analysis was to test for local adaptation of parasites to hosts. Thus, the response variable of interest is the "effect size" of local adaptation, which is calculated by comparing average parasite infectivity in local versus non-local host-parasite pairings, i.e. SYMINF vs. ALO_INF in the dataset. The calculation of that response variable ("lnRRinf") is performed using a SAS data step below. */ data parasites_2; set parasites; lnRRinf=(log(syminf/alo_inf)); /* this next line of code inserts a column with a series of integers, one for each line of data*/ studyid = _n_; run; /* For analysis of general linear models, with a continuous response variable and one or more continuous and/or categorical predictor variables, SAS has a number of different procedures (PROCs) that can be used. PROC GLM, which uses least-squares estimation of model parameters, can be used for many analyses. PROC MIXED, which uses maximum likelihood estimation of parameters, is more powerful and should be used for analyzing more complicated experimental designs, including repeated-measures and mixed models (with both random and fixed effects). */ /* Here is the PROC MIXED code for analyzing a simple t-test, asking whether local adaptation (lnRRinf) differs between animal versus plant hosts ('hosttype'). The PROC GLM code would be identical, substituting "GLM" for "MIXED"*/ proc mixed data=parasites_2 ; class hosttype ; model lnRRinf = hosttype ; lsmeans hosttype; run; /* Here is the PROC MIXED code for analyzing a one-way ANOVA, asking whether local adaptation (lnRRinf) depends on the similarity in gene flow rates between hosts and parasites (a categorical variable called GENFLWSIM, with three levels). As above, the PROC GLM code would be identical. */ proc mixed data=parasites_2 ; class GENFLWSIM ; model lnRRinf = GENFLWSIM ; lsmeans GENFLWSIM / adjust=tukey; /* 'adjust=tukey' requests tukey hsd post-hoc pairwise comparisons between the means for all levels of GENFLWSIM */ run; /* Here is the PROC MIXED code for conducting a simple linear regression, asking whether local adaptation (lnRRinf) is linearly predicted by the growth rate of the host species (host_growth_rate, a continuous variable). Again, the PROC GLM code would identical, for these simple analyses*/ proc mixed data=parasites_2 ; class ; model lnRRinf = host_growth_rate /solution; /* 'solution' asks SAS for slope and intercept */ run; /* Here is the PROC MIXED code for conducting mixed-model ANOVA, with two fixed effects (hosttype and host_growth_rate) and one random effect (paper nested within hosttype).*/ proc mixed data=parasites_2 ; class hosttype PAPER; model lnRRinf = hosttype host_growth_rate / ddfm=satterth outpred=resids; random PAPER(hosttype); lsmeans hosttype; run; /* Note: because 'hosttype' appears in the random statement, the F-test for the fixed effect of hosttype (from the model statement) will be constructed with the appropriate denominator, which is PAPER(hosttype). This happens automatically within PROC MIXED, but not within PROC GLM. */ /* Note: ddfm= allows the user to specify different methods for estimating degrees of freedom */ /* The addition of 'outpred=resids' in the model statement in our last analysis generated a dataset called 'resids' that contains the residuals from the model, and a few other useful bits of data. Here is a bit of code for examining a histogram of the residuals */ PROC GCHART data=resids; VBAR resid; RUN; quit; /* Here is the code for graphing local adaptation (lnRRinf) versus host_growth_rate, for each host type separately, using proc gplot */ proc sort data=parasites_2; by hosttype; run; proc gplot data=parasites_2; by hosttype; plot lnRRinf*host_growth_rate; run; quit;