May
24

R important things

R has no block comment, but you can still do the following things:

  • In RStudio: Ctrl+Shift+C
  • if(false){xxx} block

 

python to R

df.tail() -> tail(df)

dropna(df) ->na.omit(df) / df[complete.cases(df),]

df.describe() -> summary(df), library(psych): describe(df)

df.shape -> dim(df)

Nov
25

Cassandra, Python, R, Node.js

It is quite nice, that Cassandra has driver for Python, R and Node.js.

If you want to build a website with Cassandra DB which has also heavily scientific programming, I would suggest the following combination:

  1. Python Flask  or Django + Cassandra
  2. Python(ZeroMQ, zerorpc) + R(opencpu) + Nodejs + Cassandra
    https://ianhinsdale.com/post/communicating-between-nodejs-and-python/
    https://www.opencpu.org/

The first solution is easiest, but the second solution is more elegant since you can separate the Python, Cassandra, NodeJS in different server. Thus the advantages of Node.JS will be kept. Of course, you can also use the childprocess from nodejs to call the Python and R. 
http://www.sohamkamani.com/blog/2015/08/21/python-nodejs-comm/
https://github.com/extrabacon/python-shell

May
02

Python vs. R

It is very difficult to say which one I like more.

Jupter vs. RMarkdown

  • RMarkDown: quick edit for beautiful report, compile is slow
  • Jupyter: console in web, flexible, but sometimes you will fall into chaos
  • Both can integrate different language

Speed:

  • Python > R

Community:

  • R has more users in no computer science community

Time Series:

  • R has very good tutorials and packages for time series.

Image processing:

  • Since Python is faster than R, Python is more suitable for image processing

Dataframe:

  • R: dplyr
  • Python: pandas

Conclusion:

Master the two languages, your will find many friends. 🙂

May
02

Install Jupyter Kernel

1. How to install ipython 3 kernel to jupyter?
+ install Anaconda 3
+ copy “Jupyter notebook” symbol to desktop
+ right click -> “property” -> “start in” change to the folder which you want to begin with

2. How to install IRKernel to Jupyter?
+ Open Anaconda 3 prompt
+ conda install -c r r-essentials
+ https://www.continuum.io/blog/developer/jupyter-and-conda-r

Mar
01

hide messages in RMD

If you write Rmd file, the messages are always very disturbing. There are two methods to avoid it:

1. Global method
{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)

2. Local method
suppressMessages(library(dplyr))

Feb
16

change R version

link is from

https://support.rstudio.com/hc/en-us/articles/200486138-Using-Different-Versions-of-R

default R adress: $which R

/usr/bin/R

it can be

Force R Studio use the specific version:

export RSTUDIO_WHICH_R=/usr/local/bin/R

Dec
04

Highlights of the new RStudio releases

The new release of R Studio got many improvement. As I read them, I was very so excited about the new functions. Here are highlights for me.

  • Data can be filtered, searched, and sorted
  • Execute R code from the Source Viewer using Ctrl+Enter
  • Keyboard shortcut quick reference(Windows/Linux: Alt+Shift+K)
  • Alt + Enter to run code whilte retaining cursor position
  • Ctrl+Shift+E to select within matching parents/braces
  • Ctrl+Shift+M for magrittr pipe operator (%>%)
  • Ctrl+Alt+Shift+E to expand selection to matching paren/brace
  • dev.new() now creates a new desktop graphics device if the RStudio device is already active
  • Default to current working directory for new project from existing directory
Oct
23

Configure Git for RStudio

There is an article from RStudio explained how to configure Git for RStudio. But it is not very detailed. I ran into many problem during the configuration and wasted a lot of time just for configuration.  So I decided to write down all the steps.

For https connection:

  1. install git
  2. RStudio -> Tools -> Global options -> Git/SVN -> configure the git: C:/Program Files/Git/mingw64/bin/git.exe
  3. RStudio -> Tools -> Project options -> Git/SVN  -> Choose Git
  4. Restart RStudio
  5. In this project, open the Shell: RStudio -> Tools -> Shell, configure your remote repository adress
    git remote add origin https://{username}:{password}@github.com/{username}/project.git
    or https://{username}@github.com/{username}/project.git
    git push -u origin master
  6. Notice: If you already setted the adress, you can change it with:
    git remote set-url origin https://{username}:{password}@github.com/{username}/project.git

For ssh connection:

  • do the same 1 to 4 steps like the https connection.
  • open shell: git remote add origin git@github.com/{username}/project.git
  • generate SSH key: Tools -> Global options -> Git/SVN -> SSH RSA Key -> Generate RSA Key
  • view public key -> copy it
  • Open your github etc repositories -> add the public key to your account
  • If you installed the Putty, you will get a big problem, the RStudio can not find the ssh.exe, So you have to specify the ssh path:
    set the evironment variable: systemsteuerung -> system und sicherheit -> system -> erweiterted Systemeinstellungen -> Erweitert -> Umgebungsvariablen -> “GIT_SSH”: C:\Program Files\Git\usr\bin\ssh.exe

There are several useful links for the configuration problems:

Jul
31

trunc() and floor(), round() and signif() in R

floor takes a single numeric argument x and returns a numeric vector containing the largest integers not greater than the corresponding elements of x.

trunc takes a single numeric argument x and returns a numeric vector containing the integers formed by truncating the values in x toward 0.

round rounds the values in its first argument to the specified number of decimal places (default 0).

signif rounds the values in its first argument to the specified number of significant digits.

test:

————————–

> x <- c(-5.2, -3.8, 5.2, 3.8)

>floor(x)

-6 -4 5 3

>trunc(x)

-5 -3 5 3

> x <- 3.1415

—————————-

>round(x)

3

>round(x,3)

3.142

>signif(x,3)

3.14

Jul
25

Plot Histogram in R

Plot Histogram in R:

Method 1: hist(vector, breaks)

Take care for the parameter “breaks”:

according the help:

breaks: a single number giving the number of cells for the histogram.

Important: the number is a suggestion only! The breakpoints will be set to pretty values.

For example:

>data(“women”) #load dateset “women”

> hist(women$weight, breaks = 7)

histogram

 

 

 

 

 

 

If you want to have exact 7 bins, you should specify the position

>hist(women$weight, breaks=seq(min(women$weight), max(women$weight), l = 7))

histogram_exact_bins

 

 

 

 

 

 

 

so, you will get the exact 7 bins