SAS to R examples

In this section there is some R code useful when converting SAS code to R.

On the was will be R code that will reproduce SAS code provided online by the great SAS Institute itself, provided either online or in their help manuals. The intention is to make the output from the R code as close as possible to the SAS output.

If there are any other examples that you would like to see on these pages, please get in touch.

You are welcome to use this code only on the understanding that you do so at your own risk. Redgull Data Ltd. takes no responsibility for any code on this website used by anyone else.

Some quick wins

Work in progress...

A simple SAS join

data dsnout;

merge dsnin1 dsnin2;

by varn;

run;

R code:

library(dplyr)

dsnout <- dsnin1 %>%

inner_join(dsnin2, by = "varn")

Does your dataset contain possibly unwanted duplicates?

SAS code (dsnin is sorted by varn):

data dsnout;

set dsnin;

by varn;

if not (first.varn and last.varn);

run;

R code:

library(dplyr)

dsnout <- dsnin %>%

group_by(varn) %>%

filter(!(row_number() == 1 & n() == 1)) %>%

ungroup()

A simple merge with selections

Two datasets, ordered by varn, one record per varn.

data dsnout;

merge dsnin1(keep=urn con_:) dsnin2(keep=urn reg_:);

by varn;

run;

R code:

library(dplyr)

dsnout <- dsnin1 %>%

select(varn, starts_with("con_")) %>%

inner_join(dsnin2 %>% select(varn, starts_with("reg_")), by = "varn")

Observations that are not in both datasets

Two datasets, ordered by varn, one record per varn.

data dsnout;

merge dsnin1(in=a) dsnin2(in=b);

by varn;

if not(a and b);

if a and not b then varx='New';

if b and not a then vary='Old';

run;

R code:

library(dplyr)

dsnout <- dsnin1 %>%

full_join(dsnin2, by = "varn") %>%

filter(!(a & b)) %>%

mutate(varx = ifelse(a & !b, 'New', NA),

vary = ifelse(b & !a, 'Old', NA)) %>%

select(-a, -b)

The SAS select statement

data heart;

set sashelp.heart;

select (Smoking_Status);

when ('Non-smoker') Smoking_Cat=1;

when ('Light (1-5)') Smoking_Cat=2;

when ('Moderate (6-15)') Smoking_Cat=3;

when ('Heavy (16-25)') Smoking_Cat=4;

when ('Very Heavy (> 25)') Smoking_Cat=5;

otherwise Smoking_Cat=.;

end;

run;

R Code:

library(dplyr)

heart_data <- heart_data %>%

mutate(Smoking_Cat = case_when(

Smoking_Status == 'Non-smoker' ~ 1,

Smoking_Status == 'Light (1-5)' ~ 2,

Smoking_Status == 'Moderate (6-15)' ~ 3,

Smoking_Status == 'Heavy (16-25)' ~ 4,

Smoking_Status == 'Very Heavy (> 25)' ~ 5,

TRUE ~ NA_real_

))

Note that other values of Smoking_Status get NA

Concatenating data

data dsnout;

set dsnin1 dsnin2 dsnin3;

run;

or

proc append base=dsnin1 data=dsnin2; * proc append, of course requires matching variable types;

run;

R code:

library(dplyr)

dsnout <- bind_rows(dsnin1, dsnin2, dsnin3)

or

dsnin1 <- bind_rows(dsnin1, dsnin2)

noduplicates or nodupkey

(note that I've used the select distinct since noduplicates doesn't always work)

proc sql noprint;

create table dsnout as

select distinct *

from dsnin(keep=ONSConstID ConstituencyName RegNationName)

quit;

proc sort data=dsnin(keep=ONSConstID ConstituencyName RegNationName) out=dsnout nodupkey;

by ONSConstID;

run;

R code:

library(dplyr)

dsnout <- dsnin %>%

select(ONSConstID, ConstituencyName, RegNationName) %>%

distinct()

dsnout <- dsnin %>%

select(ONSConstID, ConstituencyName, RegNationName) %>%

distinct() %>%

arrange(ONSConstID)

Summarise several variables (R code snippet)

group_by(region) %>%

summarise(

electorate = sum(electorate, na.rm = TRUE),

valid = sum(valid, na.rm = TRUE),

invalid = sum(invalid, na.rm = TRUE)

)

Logistic Regression

Work in progress...

The SAS example of logistic regression can be found here:

https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect053.htm

It takes a dataset, neuralgia, and performs a regression analysis to give a prediction on the probability of pain.

The first SAS Institute example