as.Date, strptime

Format dates using the as.Date, or format dates with times using strptime. There are helpful defaults for as.Date.
as.Date(x, format="%Y-%m-%d")
  • x – A character vector of dates.
  • format – The format of the dates, using a percent symbols with characters to specify what types of date information can be found where. By default, the function will accept dates of the form "2012-02-26" or "2012/02/26", but this can be adjusted for other scenarios.
strptime(x, format)
  • x – A character vector of dates, possibly also including times.
  • format – The format of the dates, using a percent symbols with characters to specify what types of date and time information can be found where. See ?strptime for all of these character options and meanings.
For a complete list of the date/time specification options, see the help file for strptime.

Continue reading

cat

cat(…, file=””, sep=” “, append=FALSE)
Print output to the screen or to a file. Use cat to print information to an end-user from a function. cat is also useful for writing information that is being processed or generated, one or more lines at a time, to a file.
  • – The information to be printed to the screen or saved to a file.
  • file – An optional argument that specifies a file to be created, overwritten, or appended.
  • sep – Specifies what separates the objects in that are to be printed.
  • append – If a file is specified, then indicate whether to append to the content in the existing file (the default is not to append, which means to overwrite the existing content).

Continue reading

getURL, getURLContent (RCurl package)

getURL(url), getURLContent(url)
The getURL and getURLContent functions from the RCurl package are used to retrieve the source of a webpage, which is especially useful for retrieving pages for data processing (i.e. scraping). The getURLContent function is a little more robust, but the getURL function is usually sufficient.
  • url – A character string of a URL.

Continue reading

regexpr, gregexpr

The regexpr function is used to identify where a pattern is within a character vector, where each element is searched separately. The gregexpr function does the same thing, except that its returned object is a list rather than a vector. The functions return information sufficient to extract the pattern, unless the pattern is not found, then they return -1.
regexpr(pattern, text, ignore.case=FALSE)
  • pattern – A regular expressions pattern.
  • text – The character vector to be searched, where each element is searched separately.
  • ignore.case – Whether to ignore case in the search.
gregexpr(pattern, text, ignore.case=FALSE)
  • pattern – A regular expressions pattern.
  • text – The character vector to be searched, where each element is searched separately.
  • ignore.case – Whether to ignore case in the search.

Continue reading

substr, substring

Retrieve or replace a substring of a character string via the substr and substring functions. Additionally, these functions can be used to overwrite a part of a character string.
substr(x, start, stop)
  • x – A character string.
  • start – If the characters of x were numbered, then the number of the first character to be returned (or overwritten).
  • stop – The number of the last character to be returned (or overwritten).
substring(x, first, last=1000000)
  • x – A character string.
  • first – If the characters of x were numbered, then the number of the first character to be returned (or overwritten).
  • last – The number of the last character to be returned (or overwritten), which is defaulted to 1 million.

Continue reading

strsplit

strsplit(x, split, fixed=FALSE)
Split a character string or vector of character strings using a regular expression or a literal (fixed) string. The strsplit function outputs a list, where each list item corresponds to an element of x that has been split. In the simplest case, x is a single character string, and strsplit outputs a one-item list.
  • x – A character string or vector of character strings to split.
  • split – The character string to split x. If the split is an empty string (""), then x is split between every character.
  • fixed – If the split argument should be treated as fixed (i.e. literally). By default, the setting is FALSE, which means that split is treated like a regular expression.

Continue reading

grep

grep(pattern, x)
Search for a particular pattern in each element of a vector x. I remember the ordering of the arguments by remembering that the arguments follow the order of "needle in a haystack", where pattern is the needle and x is the haystack.
  • pattern – A regular expressions pattern, though a simple character string is probably sufficient for many people’s needs.
  • x – A character vector.

Continue reading

read.delim, write.table

The read.delim function is typically used to read in delimited text files, where data is organized in a data matrix with rows representing cases and columns representing variables. We can also write a matrix or data frame to a text file using the write.table function. Be sure to review the arguments of write.table carefully since the default settings clutter the text file (often unnecessarily).
read.delim(file, header=TRUE, sep="\t")
  • file – A file location.
  • header – Whether the first line describes the column names.
  • sep – The table delimiter, often times a tab (\t) or comma.
write.table(x, file="", …, quote=TRUE, sep=" ", row.names=TRUE)
  • x – A matrix or data frame to write to a file.
  • file – A file location.
  • quote – Whether characters or factors should have quotation marks written to the file.
  • sep – The table delimiter, often times a tab (\t) or comma.
  • row.names – Whether the row names of the matrix or data frame should be written as the first column in the file.

Continue reading