sub, gsub

sub(pattern, replacement, x)
gsub(pattern, replacement, x)
Replace the first occurrence of a pattern with sub or replace all occurrences with gsub.
  • pattern – A pattern to search for, which is assumed to be a regular expression. Use an additional argument fixed=TRUE to look for a pattern without using regular expressions.
  • replacement – A character string to replace the occurrence (or occurrences for gsub) of pattern.
  • x – A character vector to search for pattern. Each element will be searched separately.

Continue reading

getURL, getURLContent (RCurl package)

getURL(url), getURLContent(url)
The getURL and getURLContent functions from the RCurl package are used to retrieve the source of a webpage, which is especially useful for retrieving pages for data processing (i.e. scraping). The getURLContent function is a little more robust, but the getURL function is usually sufficient.
  • url – A character string of a URL.

Continue reading

regexpr, gregexpr

The regexpr function is used to identify where a pattern is within a character vector, where each element is searched separately. The gregexpr function does the same thing, except that its returned object is a list rather than a vector. The functions return information sufficient to extract the pattern, unless the pattern is not found, then they return -1.
regexpr(pattern, text, ignore.case=FALSE)
  • pattern – A regular expressions pattern.
  • text – The character vector to be searched, where each element is searched separately.
  • ignore.case – Whether to ignore case in the search.
gregexpr(pattern, text, ignore.case=FALSE)
  • pattern – A regular expressions pattern.
  • text – The character vector to be searched, where each element is searched separately.
  • ignore.case – Whether to ignore case in the search.

Continue reading

substr, substring

Retrieve or replace a substring of a character string via the substr and substring functions. Additionally, these functions can be used to overwrite a part of a character string.
substr(x, start, stop)
  • x – A character string.
  • start – If the characters of x were numbered, then the number of the first character to be returned (or overwritten).
  • stop – The number of the last character to be returned (or overwritten).
substring(x, first, last=1000000)
  • x – A character string.
  • first – If the characters of x were numbered, then the number of the first character to be returned (or overwritten).
  • last – The number of the last character to be returned (or overwritten), which is defaulted to 1 million.

Continue reading

strsplit

strsplit(x, split, fixed=FALSE)
Split a character string or vector of character strings using a regular expression or a literal (fixed) string. The strsplit function outputs a list, where each list item corresponds to an element of x that has been split. In the simplest case, x is a single character string, and strsplit outputs a one-item list.
  • x – A character string or vector of character strings to split.
  • split – The character string to split x. If the split is an empty string (""), then x is split between every character.
  • fixed – If the split argument should be treated as fixed (i.e. literally). By default, the setting is FALSE, which means that split is treated like a regular expression.

Continue reading

grep

grep(pattern, x)
Search for a particular pattern in each element of a vector x. I remember the ordering of the arguments by remembering that the arguments follow the order of "needle in a haystack", where pattern is the needle and x is the haystack.
  • pattern – A regular expressions pattern, though a simple character string is probably sufficient for many people’s needs.
  • x – A character vector.

Continue reading