Automating R scripts on Linux with cron
cron
is a task scheduler that comes baked into Linux.
The heart of cron
is the crontab file that you can add tasks to.
To edit the crontab file type:
crontab -e
This will open the VI editor.
To exit, press esc
, type in :wq
, then press Enter
. Intuitive, right? I know.
Comments in the crontab file start with #
, and tasks take the form:
# Check out this cool task below!
MIN HOUR DOM MON DOW CMD
Allowable values for each parameter are detailed in this table that I copied from Geeks for Geeks:
Field | Description | Allowed Value |
---|---|---|
MIN | Minute field | 0 to 59 |
HOUR | Hour field | 0 to 23 |
DOM | Day of Month | 1-31 |
MON | Month field | 1-12 |
DOW | Day Of Week | 0-6 |
CMD | Command | Any command to be executed. |
You can use a *
in any of the date-time fields to indicate all values. Therefore, 1 * * * * CMD
executes CMD
every minute of every hour of every day of the month of every month and so on.
But how do we use this to automate R scripts?
First, the CMD
is RScript
. Next, we pass RScript
the .R script we want to run (see the docs).
Let's pretend we have a script (my_script.R
) that we want to run once per minute. This script generates 100 random samples from a normal distribution with mean=0
and sd=1
and writes them to a csv called my_file.csv
:
library(readr)
d <- rnorm(100)
write_csv(data.frame(num = d), "my_file.csv")
Now we locate RScript
. In your favorite R
development environment, run R.home()
.
On my Mac it's:
> R.home()
[1] "/Library/Frameworks/R.framework/Resources"
Whereas on the EC2 I'm running on AWS it's:
> R.home()
[1] "/usr/lib/R"
You can navigate to this directory to verify that RScript
lives there, or believe me.
Putting it all together
Let's create a crontab
that runs my_script.R
once every minute. We use RScript
to run my_script.R
. We add the following line to the crontab file we opened with crontab -e
:
# once every minute, run `my_script.R`
1 * * * * RScript "my_script.R"
Note that the first line is just a comment, whereas the second line is the command. Moreover, in the example above, you need to:
- specify the full path of
RScript
- specify the full path of
my_script.R
~/my_script
doesn't work, whereas /home/richpauloo/my_script.R
does.Here are some resources I found helpful in writing this short summary: