R important things

R has no block comment, but you can still do the following things:

  • In RStudio: Ctrl+Shift+C
  • if(false){xxx} block


python to R

df.tail() -> tail(df)

dropna(df) ->na.omit(df) / df[complete.cases(df),]

df.describe() -> summary(df), library(psych): describe(df)

df.shape -> dim(df)


Cassandra, Python, R, Node.js

It is quite nice, that Cassandra has driver for Python, R and Node.js.

If you want to build a website with Cassandra DB which has also heavily scientific programming, I would suggest the following combination:

  1. Python Flask  or Django + Cassandra
  2. Python(ZeroMQ, zerorpc) + R(opencpu) + Nodejs + Cassandra

The first solution is easiest, but the second solution is more elegant since you can separate the Python, Cassandra, NodeJS in different server. Thus the advantages of Node.JS will be kept. Of course, you can also use the childprocess from nodejs to call the Python and R.


Python vs. R

It is very difficult to say which one I like more.

Jupter vs. RMarkdown

  • RMarkDown: quick edit for beautiful report, compile is slow
  • Jupyter: console in web, flexible, but sometimes you will fall into chaos
  • Both can integrate different language


  • Python > R


  • R has more users in no computer science community

Time Series:

  • R has very good tutorials and packages for time series.

Image processing:

  • Since Python is faster than R, Python is more suitable for image processing


  • R: dplyr
  • Python: pandas


Master the two languages, your will find many friends. 🙂


Install Jupyter Kernel

1. How to install ipython 3 kernel to jupyter?
+ install Anaconda 3
+ copy “Jupyter notebook” symbol to desktop
+ right click -> “property” -> “start in” change to the folder which you want to begin with

2. How to install IRKernel to Jupyter?
+ Open Anaconda 3 prompt
+ conda install -c r r-essentials


hide messages in RMD

If you write Rmd file, the messages are always very disturbing. There are two methods to avoid it:

1. Global method
{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)

2. Local method


change R version

link is from

default R adress: $which R


it can be

Force R Studio use the specific version:

export RSTUDIO_WHICH_R=/usr/local/bin/R


Highlights of the new RStudio releases

The new release of R Studio got many improvement. As I read them, I was very so excited about the new functions. Here are highlights for me.

  • Data can be filtered, searched, and sorted
  • Execute R code from the Source Viewer using Ctrl+Enter
  • Keyboard shortcut quick reference(Windows/Linux: Alt+Shift+K)
  • Alt + Enter to run code whilte retaining cursor position
  • Ctrl+Shift+E to select within matching parents/braces
  • Ctrl+Shift+M for magrittr pipe operator (%>%)
  • Ctrl+Alt+Shift+E to expand selection to matching paren/brace
  • now creates a new desktop graphics device if the RStudio device is already active
  • Default to current working directory for new project from existing directory

Configure Git for RStudio

There is an article from RStudio explained how to configure Git for RStudio. But it is not very detailed. I ran into many problem during the configuration and wasted a lot of time just for configuration.  So I decided to write down all the steps.

For https connection:

  1. install git
  2. RStudio -> Tools -> Global options -> Git/SVN -> configure the git: C:/Program Files/Git/mingw64/bin/git.exe
  3. RStudio -> Tools -> Project options -> Git/SVN  -> Choose Git
  4. Restart RStudio
  5. In this project, open the Shell: RStudio -> Tools -> Shell, configure your remote repository adress
    git remote add origin https://{username}:{password}{username}/project.git
    or https://{username}{username}/project.git
    git push -u origin master
  6. Notice: If you already setted the adress, you can change it with:
    git remote set-url origin https://{username}:{password}{username}/project.git

For ssh connection:

  • do the same 1 to 4 steps like the https connection.
  • open shell: git remote add origin{username}/project.git
  • generate SSH key: Tools -> Global options -> Git/SVN -> SSH RSA Key -> Generate RSA Key
  • view public key -> copy it
  • Open your github etc repositories -> add the public key to your account
  • If you installed the Putty, you will get a big problem, the RStudio can not find the ssh.exe, So you have to specify the ssh path:
    set the evironment variable: systemsteuerung -> system und sicherheit -> system -> erweiterted Systemeinstellungen -> Erweitert -> Umgebungsvariablen -> “GIT_SSH”: C:\Program Files\Git\usr\bin\ssh.exe

There are several useful links for the configuration problems:


trunc() and floor(), round() and signif() in R

floor takes a single numeric argument x and returns a numeric vector containing the largest integers not greater than the corresponding elements of x.

trunc takes a single numeric argument x and returns a numeric vector containing the integers formed by truncating the values in x toward 0.

round rounds the values in its first argument to the specified number of decimal places (default 0).

signif rounds the values in its first argument to the specified number of significant digits.



> x <- c(-5.2, -3.8, 5.2, 3.8)


-6 -4 5 3


-5 -3 5 3

> x <- 3.1415









Plot Histogram in R

Plot Histogram in R:

Method 1: hist(vector, breaks)

Take care for the parameter “breaks”:

according the help:

breaks: a single number giving the number of cells for the histogram.

Important: the number is a suggestion only! The breakpoints will be set to pretty values.

For example:

>data(“women”) #load dateset “women”

> hist(women$weight, breaks = 7)








If you want to have exact 7 bins, you should specify the position

>hist(women$weight, breaks=seq(min(women$weight), max(women$weight), l = 7))









so, you will get the exact 7 bins