Saturday, March 26, 2016

Getting Started with Apache Spark

Gregorys-iMac-2:spark-1.6.1 gMac$ ./bin/spark-shell
ls: /Users/gMac/dev/bin/spark-1.6.1/assembly/target/scala-2.10: No such file or directory
Failed to find Spark assembly in /Users/gMac/dev/bin/spark-1.6.1/assembly/target/scala-2.10.
You need to build Spark before running this program.
Gregorys-iMac-2:spark-1.6.1 gMac$ build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
exec: curl --progress-bar -L http://downloads.typesafe.com/zinc/0.3.5.3/zinc-0.3.5.3.tgz
######################################################################## 100.0%
. . .
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 20:24 min
[INFO] Finished at: 2016-03-26T14:47:06-05:00
[INFO] Final Memory: 89M/1351M
[INFO] ------------------------------------------------------------------------
Gregorys-iMac-2:spark-1.6.1 gMac$ ./bin/spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.

scala> val textFile = sc.textFile("README.md")
textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[3] at textFile at :27

scala> textFile.count()
res1: Long = 95

scala> textFile.first()
res2: String = # Apache Spark

scala>