Spark MLlib

Spark MLlib 介紹: Linear Regression、SVM

Linear Regression

Spark 相較其他平台有一個好處就是對 Machine Learning 的支援。為了利用此支援性,我們也找其中一個範例程式來編譯使用,在這一文章中,我們將以 Linear Regression 為範例,其參考的網頁為: https://spark.apache.org/docs/2.2.0/ml-classification-regression.html#linear-regression 而整份參考的程式位址如下: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/LinearRegressionWithElasticNetExample.scala

首先,我們先創立資料夾,並移入 example 中的範例:

~$ mkdir -p liRegression/project
~$ mkdir -p liRegression/src/main/scala
~$ cp /usr/lib/spark/examples/src/main/scala/org/apache/spark/examples/mllib/LinearRegressionWithSGDExample.scala ~/liRegression/src/main/scala/LinearRegression.scala

考慮到對 MLlib 的支援,在宣告 build.sbt 時,也需要做一些修改,主要是加入 MLlib 的相依性,並更改專案名稱。

name := "liRegression"

version := "1.0"

scalaVersion := "2.11.6"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.2"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.2"

libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.1.2"

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

接著

 val data = sc.textFile("data/mllib/ridge-data/lpsa.data")

修改為:

 val data = sc.textFile("/usr/lib/spark/data/mllib/ridge-data/lpsa.data")

當然,也要設定 master 為 local,如下所示:

val conf = new SparkConf().setAppName("LinearRegressionWithSGDExample").setMaster("local")

完成之後,進行 compile 可以得到以下結果:

$ sbt compile
[info] Loading project definition from /home/ubuntu/liRegression/project
[info] Loading settings for project liregression from build.sbt ...
[info] Set current project to liRegression (in build file:/home/ubuntu/liRegression/)
[info] Executing in batch mode. For better performance use sbt's shell
[info] Compiling 1 Scala source to /home/ubuntu/liRegression/target/scala-2.11/classes ...
[info] Done compiling.
[success] Total time: 6 s, completed Oct 30, 2018 8:12:08 AM

Last updated