SPARK
  • Spark 基本介紹
  • RDD 與 兩種操作
    • Resilient Distributed Dataset (RDD)
    • RDD vs. DataFrame vs. DataSet
    • Action 和 Transformation
  • Spark 環境安裝
    • Spark 平台的安裝
    • Spark 的編譯環境建立
    • IntelliJ IDEA 開發環境
    • Spark 語言選擇: Scala vs. python
  • Spark 分項簡介
    • Spark MLlib
    • Spark Streaming
    • Spark SQL
  • Spark 程式解說
    • Scala: SparkPi 解說
    • Scala: Multi-class Classifier
    • Scala: MLlib SVM
    • Scala: 資料的輸入與處理
    • Scala: 存取 MySQL 的資料
Powered by GitBook
On this page

Was this helpful?

  1. Spark 分項簡介

Spark MLlib

Spark MLlib 介紹: Linear Regression、SVM

PreviousSpark 語言選擇: Scala vs. pythonNextSpark Streaming

Last updated 6 years ago

Was this helpful?

Linear Regression

Spark 相較其他平台有一個好處就是對 Machine Learning 的支援。為了利用此支援性,我們也找其中一個範例程式來編譯使用,在這一文章中,我們將以 Linear Regression 為範例,其參考的網頁為: 而整份參考的程式位址如下:

首先,我們先創立資料夾,並移入 example 中的範例:

~$ mkdir -p liRegression/project
~$ mkdir -p liRegression/src/main/scala
~$ cp /usr/lib/spark/examples/src/main/scala/org/apache/spark/examples/mllib/LinearRegressionWithSGDExample.scala ~/liRegression/src/main/scala/LinearRegression.scala

考慮到對 MLlib 的支援,在宣告 build.sbt 時,也需要做一些修改,主要是加入 MLlib 的相依性,並更改專案名稱。

name := "liRegression"

version := "1.0"

scalaVersion := "2.11.6"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.2"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.2"

libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.1.2"

resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

接著

 val data = sc.textFile("data/mllib/ridge-data/lpsa.data")

修改為:

 val data = sc.textFile("/usr/lib/spark/data/mllib/ridge-data/lpsa.data")

當然,也要設定 master 為 local,如下所示:

val conf = new SparkConf().setAppName("LinearRegressionWithSGDExample").setMaster("local")

完成之後,進行 compile 可以得到以下結果:

$ sbt compile
[info] Loading project definition from /home/ubuntu/liRegression/project
[info] Loading settings for project liregression from build.sbt ...
[info] Set current project to liRegression (in build file:/home/ubuntu/liRegression/)
[info] Executing in batch mode. For better performance use sbt's shell
[info] Compiling 1 Scala source to /home/ubuntu/liRegression/target/scala-2.11/classes ...
[info] Done compiling.
[success] Total time: 6 s, completed Oct 30, 2018 8:12:08 AM

https://spark.apache.org/docs/2.2.0/ml-classification-regression.html#linear-regression
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/LinearRegressionWithElasticNetExample.scala