Apache Spark: java.lang.IllegalArgumentException: <gépnév>

Apache Spark logo
Probléma: Végre akarsz hajtani egy Apache Spark lekérdezést, de az java.lang.IllegalArgumentException: <gépnév>-vel elhal.

Megoldás: Nem működik a Spark gépek között a névfeloldás. Spark esetén minden gépnek tudnia kell a másik nevéhez tartozó IP címet. Egy kisebb klaszter esetén megoldás lehet a hosts fájl használata. Ne feledkezzünk meg róla, hogy ez egy elosztott rendszer, tehát minden gép hosts fájljába írjuk be mindegyik gép nevét és IP címét.

Teljes exception és stack trace:
Legyen az aktuális gép neve spark-1

spark-sql> select * from testkeyspace.testtable;
WARN  2019-02-15 03:42:33,835 org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 4, 10.1.6.224): java.io.IOException: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1
        at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:212)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
        at org.apache.spark.scheduler.Task.run(Task.scala:70)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1
        at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:122)
        at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.initialize(CqlHiveRecordReader.java:91)
        at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:207)
        ... 20 more
Caused by: java.lang.IllegalArgumentException: spark-1
        at com.datastax.driver.core.Cluster$Builder.addContactPoint(Cluster.java:768)
        at com.datastax.driver.core.Cluster$Builder.addContactPoints(Cluster.java:790)
        at org.apache.cassandra.hadoop.cql3.CqlConfigHelper.getInputCluster(CqlConfigHelper.java:305)
        at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:118)
        ... 22 more
 
WARN  2019-02-15 03:42:38,287 org.apache.spark.scheduler.TaskSetManager: Lost task 43.0 in stage 2.0 (TID 50, 10.1.4.130): java.io.IOException: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1
        at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:212)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
        at org.apache.spark.scheduler.Task.run(Task.scala:70)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1
        at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:122)
        at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.initialize(CqlHiveRecordReader.java:91)
        at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:207)
        ... 20 more
Caused by: java.lang.IllegalArgumentException: spark-1
        at com.datastax.driver.core.Cluster$Builder.addContactPoint(Cluster.java:768)
        at com.datastax.driver.core.Cluster$Builder.addContactPoints(Cluster.java:790)
        at org.apache.cassandra.hadoop.cql3.CqlConfigHelper.getInputCluster(CqlConfigHelper.java:305)
        at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:118)
        ... 22 more
 
ERROR 2019-02-15 03:42:38,367 org.apache.spark.scheduler.TaskSetManager: Task 37 in stage 2.0 failed 4 times; aborting job
ERROR 2019-02-15 03:42:38,372 org.apache.spark.sql.hive.thriftserver.SparkSQLDriver: Failed in [select * from testkeyspace.testtable]
org.apache.spark.SparkException: Job aborted due to stage failure: Task 37 in stage 2.0 failed 4 times, most recent failure: Lost task 37.3 in stage 2.0 (TID 81, 10.1.6.224): java.io.IOException: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1
        at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:212)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
        at org.apache.spark.scheduler.Task.run(Task.scala:70)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1
        at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:122)
        at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.initialize(CqlHiveRecordReader.java:91)
        at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:207)
        ... 20 more
Caused by: java.lang.IllegalArgumentException: spark-1
        at com.datastax.driver.core.Cluster$Builder.addContactPoint(Cluster.java:768)
        at com.datastax.driver.core.Cluster$Builder.addContactPoints(Cluster.java:790)
        at org.apache.cassandra.hadoop.cql3.CqlConfigHelper.getInputCluster(CqlConfigHelper.java:305)
        at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:118)
        ... 22 more
 
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1276) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5]
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5]
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1266) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5]
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) ~[scala-library-2.10.5.jar:na]
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) ~[scala-library-2.10.5.jar:na]
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1266) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5]
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5]
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5]
        at scala.Option.foreach(Option.scala:236) ~[scala-library-2.10.5.jar:na]
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5]
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1460) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5]
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1421) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5]
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5]
org.apache.spark.SparkException: Job aborted due to stage failure: Task 37 in stage 2.0 failed 4 times, most recent failure: Lost task 37.3 in stage 2.0 (TID 81, 10.1.6.224): java.io.IOException: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1
        at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:212)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
        at org.apache.spark.scheduler.Task.run(Task.scala:70)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1
        at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:122)
        at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.initialize(CqlHiveRecordReader.java:91)
        at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:207)
        ... 20 more
Caused by: java.lang.IllegalArgumentException: spark-1
        at com.datastax.driver.core.Cluster$Builder.addContactPoint(Cluster.java:768)
        at com.datastax.driver.core.Cluster$Builder.addContactPoints(Cluster.java:790)
        at org.apache.cassandra.hadoop.cql3.CqlConfigHelper.getInputCluster(CqlConfigHelper.java:305)
        at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:118)
        ... 22 more
 
Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1276)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1266)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1266)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1460)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1421)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
 
Time taken: 11.338 seconds
spark-sql> WARN  2019-02-15 03:42:39,830 org.apache.spark.scheduler.TaskSetManager: Lost task 42.0 in stage 2.0 (TID 49, 10.1.7.102): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:40,348 org.apache.spark.scheduler.TaskSetManager: Lost task 14.0 in stage 2.0 (TID 21, 10.1.7.102): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:40,379 org.apache.spark.scheduler.TaskSetManager: Lost task 22.0 in stage 2.0 (TID 29, 10.1.7.102): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:40,391 org.apache.spark.scheduler.TaskSetManager: Lost task 6.0 in stage 2.0 (TID 13, 10.1.7.102): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:40,405 org.apache.spark.scheduler.TaskSetManager: Lost task 10.0 in stage 2.0 (TID 17, 10.1.7.102): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:40,424 org.apache.spark.scheduler.TaskSetManager: Lost task 18.0 in stage 2.0 (TID 25, 10.1.7.102): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:40,425 org.apache.spark.scheduler.TaskSetManager: Lost task 2.0 in stage 2.0 (TID 9, 10.1.7.102): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:40,425 org.apache.spark.scheduler.TaskSetManager: Lost task 30.0 in stage 2.0 (TID 37, 10.1.7.102): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:40,442 org.apache.spark.scheduler.TaskSetManager: Lost task 38.0 in stage 2.0 (TID 45, 10.1.7.102): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:40,444 org.apache.spark.scheduler.TaskSetManager: Lost task 34.0 in stage 2.0 (TID 41, 10.1.7.102): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:40,445 org.apache.spark.scheduler.TaskSetManager: Lost task 26.0 in stage 2.0 (TID 33, 10.1.7.102): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:41,051 org.apache.spark.scheduler.TaskSetManager: Lost task 19.0 in stage 2.0 (TID 26, 10.1.4.130): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:41,051 org.apache.spark.scheduler.TaskSetManager: Lost task 3.0 in stage 2.0 (TID 10, 10.1.4.130): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:41,052 org.apache.spark.scheduler.TaskSetManager: Lost task 39.0 in stage 2.0 (TID 46, 10.1.4.130): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:41,774 org.apache.spark.scheduler.TaskSetManager: Lost task 4.0 in stage 2.0 (TID 11, 10.1.6.132): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:41,775 org.apache.spark.scheduler.TaskSetManager: Lost task 20.0 in stage 2.0 (TID 27, 10.1.6.132): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:41,787 org.apache.spark.scheduler.TaskSetManager: Lost task 41.0 in stage 2.0 (TID 48, 10.1.6.224): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:41,787 org.apache.spark.scheduler.TaskSetManager: Lost task 33.0 in stage 2.0 (TID 40, 10.1.6.224): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:41,788 org.apache.spark.scheduler.TaskSetManager: Lost task 45.0 in stage 2.0 (TID 53, 10.1.6.224): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:41,788 org.apache.spark.scheduler.TaskSetManager: Lost task 21.0 in stage 2.0 (TID 28, 10.1.6.224): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:41,788 org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in stage 2.0 (TID 8, 10.1.6.224): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:41,855 org.apache.spark.scheduler.TaskSetManager: Lost task 32.0 in stage 2.0 (TID 39, 10.1.6.132): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:42,073 org.apache.spark.scheduler.TaskSetManager: Lost task 24.0 in stage 2.0 (TID 31, 10.1.6.132): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:42,100 org.apache.spark.scheduler.TaskSetManager: Lost task 28.0 in stage 2.0 (TID 35, 10.1.6.132): TaskKilled (killed intentionally)
WARN  2019-02-15 03:42:42,103 org.apache.spark.scheduler.TaskSetManager: Lost task 12.0 in stage 2.0 (TID 19, 10.1.6.132): TaskKilled (killed intentionally)