strsplit

strsplit(x, split, fixed=FALSE)
Split a character string or vector of character strings using a regular expression or a literal (fixed) string. The strsplit function outputs a list, where each list item corresponds to an element of x that has been split. In the simplest case, x is a single character string, and strsplit outputs a one-item list.
  • x – A character string or vector of character strings to split.
  • split – The character string to split x. If the split is an empty string (""), then x is split between every character.
  • fixed – If the split argument should be treated as fixed (i.e. literally). By default, the setting is FALSE, which means that split is treated like a regular expression.

Example. Several starter examples are shown below (note that a period is a stand in for "any character" in regular expressions), followed by a couple scenarios that are a little more practical. For instance, dates are split into year, month, and day, and names in the form Last, First are split at their comma.
> x <- "Split the words in a sentence."
> strsplit(x, " ")
[[1]]
[1] "Split"     "the"       "words"     "in"       
[5] "a"         "sentence."

> 
> x <- "Split at every character."
> strsplit(x, "")
[[1]]
 [1] "S" "p" "l" "i" "t" " " "a" "t" " " "e" "v" "e" "r" "y"
[15] " " "c" "h" "a" "r" "a" "c" "t" "e" "r" "."

> 
> x <- " Split at each space with a preceding character."
> strsplit(x, ". ")
[[1]]
[1] " Spli"      "a"          "eac"        "spac"      
[5] "wit"        ""           "precedin"   "character."

> 
> x <- "Do you wish you were Mr. Jones?"
> strsplit(x, ". ")
[[1]]
[1] "D"      "yo"     "wis"    "yo"     "wer"    "Mr"    
[7] "Jones?"

> strsplit(x, ". ", fixed=TRUE)
[[1]]
[1] "Do you wish you were Mr" "Jones?"                 

> 
> #=====> Splitting Dates <=====#
> dates <- c("1999-05-23", "2001-12-30", "2004-12-17")
> temp  <- strsplit(dates, "-")
> temp
[[1]]
[1] "1999" "05"   "23"  

[[2]]
[1] "2001" "12"   "30"  

[[3]]
[1] "2004" "12"   "17"  

> matrix(unlist(temp), ncol=3, byrow=TRUE)
     [,1]   [,2] [,3]
[1,] "1999" "05" "23"
[2,] "2001" "12" "30"
[3,] "2004" "12" "17"
> 
> #=====> Cofounders of Google and Twitter <=====#
> Names <- c("Brin, Sergey", "Page, Larry",
+            "Dorsey, Jack", "Glass, Noah",
+            "Williams, Evan", "Stone, Biz")
> Cofounded <- rep(c("Google", "Twitter"), c(2,4))
> temp <- strsplit(Names, ", ")
> temp
[[1]]
[1] "Brin"   "Sergey"

[[2]]
[1] "Page"  "Larry"

[[3]]
[1] "Dorsey" "Jack"  

[[4]]
[1] "Glass" "Noah" 

[[5]]
[1] "Williams" "Evan"    

[[6]]
[1] "Stone" "Biz"  

> mat  <- matrix(unlist(temp), ncol=2, byrow=TRUE)
> df   <- as.data.frame(mat)
> df   <- cbind(df, Cofounded)
> colnames(df) <- c("Last", "First", "Cofounded")
> df
      Last  First Cofounded
1     Brin Sergey    Google
2     Page  Larry    Google
3   Dorsey   Jack   Twitter
4    Glass   Noah   Twitter
5 Williams   Evan   Twitter
6    Stone    Biz   Twitter
Tip. A helpful regular expressions guide may be found here. If you have an alternative recommendation, please send an email or post a link below, especially for ground-up tutorials.

Leave a Reply