$ java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
安裝 python
sudo apt-get install python2.7
測試 python 安裝成果
$ python
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
安裝 python3
sudo apt-get install python3
測試安裝結果:
ubuntu@testspark:~$ python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
安裝 Scala
sudo apt-get install scala
測試安裝結果:
$ scala
Welcome to Scala version 2.11.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_191).
Type in expressions to have them evaluated.
Type :help for more information.
scala> println("hello world")
hello world
scala>
$ pyspark
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/10/29 14:32:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/10/29 14:33:04 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Python version 2.7.12 (default, Dec 4 2017 14:50:18)
SparkSession available as 'spark'.
>>>
和python一樣,可以使用 CTRL+D 或是 exit() 離開。
執行範例程式 (SparkPi)
run-example SparkPi 10
結果如下 (已經刪去不重要的 log 資訊):
$ run-example SparkPi 10
...
18/10/29 14:35:46 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 10 output partitions
18/10/29 14:35:46 INFO DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
18/10/29 14:35:46 INFO DAGScheduler: Parents of final stage: List()
18/10/29 14:35:46 INFO DAGScheduler: Missing parents: List()
18/10/29 14:35:46 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
18/10/29 14:35:47 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1832.0 B, free 366.3 MB)
18/10/29 14:35:47 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1172.0 B, free 366.3 MB)
18/10/29 14:35:47 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.16.0.222:42676 (size: 1172.0 B, free: 366.3 MB)
18/10/29 14:35:47 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:996
18/10/29 14:35:47 INFO DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34)
18/10/29 14:35:47 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
18/10/29 14:35:47 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 6086 bytes)
18/10/29 14:35:47 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 6086 bytes)
18/10/29 14:35:47 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 6086 bytes)
18/10/29 14:35:47 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 6086 bytes)
18/10/29 14:35:47 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
18/10/29 14:35:47 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
18/10/29 14:35:47 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
18/10/29 14:35:47 INFO Executor: Fetching spark://172.16.0.222:46320/jars/spark-examples_2.11-2.1.0.jar with timestamp 1540823746156
18/10/29 14:35:47 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
18/10/29 14:35:47 INFO TransportClientFactory: Successfully created connection to /172.16.0.222:46320 after 23 ms (0 ms spent in bootstraps)
18/10/29 14:35:47 INFO Utils: Fetching spark://172.16.0.222:46320/jars/spark-examples_2.11-2.1.0.jar to /tmp/spark-4cf3c215-48c4-4fe4-8c00-fe1cad6637e2/userFiles-81e56a98-0010-4aa5-af77-aff04ebff077/fetchFileTemp8594285529631276460.tmp
18/10/29 14:35:47 INFO Executor: Adding file:/tmp/spark-4cf3c215-48c4-4fe4-8c00-fe1cad6637e2/userFiles-81e56a98-0010-4aa5-af77-aff04ebff077/spark-examples_2.11-2.1.0.jar to class loader
18/10/29 14:35:47 INFO Executor: Fetching spark://172.16.0.222:46320/jars/scopt_2.11-3.3.0.jar with timestamp 1540823746156
18/10/29 14:35:47 INFO Utils: Fetching spark://172.16.0.222:46320/jars/scopt_2.11-3.3.0.jar to /tmp/spark-4cf3c215-48c4-4fe4-8c00-fe1cad6637e2/userFiles-81e56a98-0010-4aa5-af77-aff04ebff077/fetchFileTemp7738169543322034741.tmp
18/10/29 14:35:47 INFO Executor: Adding file:/tmp/spark-4cf3c215-48c4-4fe4-8c00-fe1cad6637e2/userFiles-81e56a98-0010-4aa5-af77-aff04ebff077/scopt_2.11-3.3.0.jar to class loader
18/10/29 14:35:47 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1041 bytes result sent to driver
...
18/10/29 14:35:47 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
18/10/29 14:35:47 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.528 s
18/10/29 14:35:47 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.760375 s
Pi is roughly 3.1364911364911365