offset, I

These two functions are commonly used directly within a formula. Terms in a formula that should have coefficients fixed at 1 should be wrapped in offset. Wrapping an expression (e.g. x1+x2) in I will make the expression be treated as a single variable in a formula, meaning it will get only a single coefficient estimate.
offset(object)
  • object – A variable in an equation.
I(x)
  • x – An object, often an expression of other objects.

Example. Below a Poisson generalized linear model (GLM) is created, where the variable Y is the number of events. We assume
$latex Y_i \sim Poisson(\mu_i)$
We also have N as the size of each group, and x1 and x2 are predictors. We compare the following two models on $latex \mu_i$:
$latex \log(\mu_i) = \log(N) + \beta_1 x_1 + \beta_2 x_2$
$latex \log(\mu_i) = \log(N) + \beta (x_1 + x_2)$
> Y  <- c(15,  7, 36,  4,
+         16, 12, 41, 15)
> N  <- c(4949, 3534, 12210, 344,
+         6178, 4883, 11256, 7125)
> x1 <- c(-0.1, 0, 0.2, 0,
+          1, 1.1, 1.1, 1)
> x2 <- c(2.2, 1.5, 4.5, 7.2,
+         4.5, 3.2, 9.1, 5.2)
> 
> glm(Y ~ offset(log(N)) + (x1 + x2), family=poisson)

Call:  glm(formula = Y ~ offset(log(N)) + (x1 + x2), family = poisson)

Coefficients:
(Intercept)           x1           x2  
     -6.172       -0.380        0.109  

Degrees of Freedom: 7 Total (i.e. Null);  5 Residual
Null Deviance:	    10.56 
Residual Deviance: 4.559 	AIC: 46.69 
> 
> glm(Y ~ offset(log(N)) + I(x1+x2), family=poisson)

Call:  glm(formula = Y ~ offset(log(N)) + I(x1 + x2), family = poisson)

Coefficients:
(Intercept)   I(x1 + x2)  
   -6.12652      0.04746  

Degrees of Freedom: 7 Total (i.e. Null);  6 Residual
Null Deviance:	    10.56 
Residual Deviance: 8.001 	AIC: 48.13 
In both equations, the offset term receives no coefficient estimate since its coefficient is set to 1. Notice how in the first glm call the variables x1 and x2 are treated separately despite the parentheses. Each gets its own coefficient estimate. In the second call to glm, I(x1+x2) is treated as a single variable, getting only one coefficient.
Tip 1. The types of expressions that can be included within I is not limited to linear combinations. Even complex expressions of variables may be used, though this is less common.
Tip 2. An alternative approach to using I is to create an entirely new variable. This is helpful in cases when the combination of the variables represents an important new variable. In such cases, it is advisable to create a new variable with a thoughtful name.

Leave a Reply