Probléma: Végre akarsz hajtani egy Apache Spark lekérdezést, de az java.lang.IllegalArgumentException: <gépnév>
-vel elhal.
Megoldás: Nem működik a Spark gépek között a névfeloldás. Spark esetén minden gépnek tudnia kell a másik nevéhez tartozó IP címet. Egy kisebb klaszter esetén megoldás lehet a hosts fájl használata. Ne feledkezzünk meg róla, hogy ez egy elosztott rendszer, tehát minden gép hosts fájljába írjuk be mindegyik gép nevét és IP címét.
Teljes exception és stack trace:
Legyen az aktuális gép neve spark-1
spark-sql> select * from testkeyspace.testtable; WARN 2019-02-15 03:42:33,835 org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 4, 10.1.6.224): java.io.IOException: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1 at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:212) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1 at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:122) at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.initialize(CqlHiveRecordReader.java:91) at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:207) ... 20 more Caused by: java.lang.IllegalArgumentException: spark-1 at com.datastax.driver.core.Cluster$Builder.addContactPoint(Cluster.java:768) at com.datastax.driver.core.Cluster$Builder.addContactPoints(Cluster.java:790) at org.apache.cassandra.hadoop.cql3.CqlConfigHelper.getInputCluster(CqlConfigHelper.java:305) at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:118) ... 22 more WARN 2019-02-15 03:42:38,287 org.apache.spark.scheduler.TaskSetManager: Lost task 43.0 in stage 2.0 (TID 50, 10.1.4.130): java.io.IOException: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1 at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:212) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1 at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:122) at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.initialize(CqlHiveRecordReader.java:91) at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:207) ... 20 more Caused by: java.lang.IllegalArgumentException: spark-1 at com.datastax.driver.core.Cluster$Builder.addContactPoint(Cluster.java:768) at com.datastax.driver.core.Cluster$Builder.addContactPoints(Cluster.java:790) at org.apache.cassandra.hadoop.cql3.CqlConfigHelper.getInputCluster(CqlConfigHelper.java:305) at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:118) ... 22 more ERROR 2019-02-15 03:42:38,367 org.apache.spark.scheduler.TaskSetManager: Task 37 in stage 2.0 failed 4 times; aborting job ERROR 2019-02-15 03:42:38,372 org.apache.spark.sql.hive.thriftserver.SparkSQLDriver: Failed in [select * from testkeyspace.testtable] org.apache.spark.SparkException: Job aborted due to stage failure: Task 37 in stage 2.0 failed 4 times, most recent failure: Lost task 37.3 in stage 2.0 (TID 81, 10.1.6.224): java.io.IOException: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1 at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:212) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1 at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:122) at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.initialize(CqlHiveRecordReader.java:91) at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:207) ... 20 more Caused by: java.lang.IllegalArgumentException: spark-1 at com.datastax.driver.core.Cluster$Builder.addContactPoint(Cluster.java:768) at com.datastax.driver.core.Cluster$Builder.addContactPoints(Cluster.java:790) at org.apache.cassandra.hadoop.cql3.CqlConfigHelper.getInputCluster(CqlConfigHelper.java:305) at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:118) ... 22 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1276) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1266) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5] at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) ~[scala-library-2.10.5.jar:na] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) ~[scala-library-2.10.5.jar:na] at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1266) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5] at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5] at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5] at scala.Option.foreach(Option.scala:236) ~[scala-library-2.10.5.jar:na] at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5] at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1460) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5] at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1421) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5] at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) ~[spark-core_2.10-1.4.2.5.jar:1.4.2.5] org.apache.spark.SparkException: Job aborted due to stage failure: Task 37 in stage 2.0 failed 4 times, most recent failure: Lost task 37.3 in stage 2.0 (TID 81, 10.1.6.224): java.io.IOException: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1 at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:212) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: spark-1 at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:122) at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.initialize(CqlHiveRecordReader.java:91) at org.apache.hadoop.hive.cassandra.cql3.input.HiveCqlInputFormat.getRecordReader(HiveCqlInputFormat.java:207) ... 20 more Caused by: java.lang.IllegalArgumentException: spark-1 at com.datastax.driver.core.Cluster$Builder.addContactPoint(Cluster.java:768) at com.datastax.driver.core.Cluster$Builder.addContactPoints(Cluster.java:790) at org.apache.cassandra.hadoop.cql3.CqlConfigHelper.getInputCluster(CqlConfigHelper.java:305) at org.apache.cassandra.hadoop.cql3.CqlRecordReader.initialize(CqlRecordReader.java:118) ... 22 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1276) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1266) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1266) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1460) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1421) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) Time taken: 11.338 seconds spark-sql> WARN 2019-02-15 03:42:39,830 org.apache.spark.scheduler.TaskSetManager: Lost task 42.0 in stage 2.0 (TID 49, 10.1.7.102): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:40,348 org.apache.spark.scheduler.TaskSetManager: Lost task 14.0 in stage 2.0 (TID 21, 10.1.7.102): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:40,379 org.apache.spark.scheduler.TaskSetManager: Lost task 22.0 in stage 2.0 (TID 29, 10.1.7.102): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:40,391 org.apache.spark.scheduler.TaskSetManager: Lost task 6.0 in stage 2.0 (TID 13, 10.1.7.102): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:40,405 org.apache.spark.scheduler.TaskSetManager: Lost task 10.0 in stage 2.0 (TID 17, 10.1.7.102): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:40,424 org.apache.spark.scheduler.TaskSetManager: Lost task 18.0 in stage 2.0 (TID 25, 10.1.7.102): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:40,425 org.apache.spark.scheduler.TaskSetManager: Lost task 2.0 in stage 2.0 (TID 9, 10.1.7.102): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:40,425 org.apache.spark.scheduler.TaskSetManager: Lost task 30.0 in stage 2.0 (TID 37, 10.1.7.102): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:40,442 org.apache.spark.scheduler.TaskSetManager: Lost task 38.0 in stage 2.0 (TID 45, 10.1.7.102): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:40,444 org.apache.spark.scheduler.TaskSetManager: Lost task 34.0 in stage 2.0 (TID 41, 10.1.7.102): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:40,445 org.apache.spark.scheduler.TaskSetManager: Lost task 26.0 in stage 2.0 (TID 33, 10.1.7.102): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:41,051 org.apache.spark.scheduler.TaskSetManager: Lost task 19.0 in stage 2.0 (TID 26, 10.1.4.130): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:41,051 org.apache.spark.scheduler.TaskSetManager: Lost task 3.0 in stage 2.0 (TID 10, 10.1.4.130): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:41,052 org.apache.spark.scheduler.TaskSetManager: Lost task 39.0 in stage 2.0 (TID 46, 10.1.4.130): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:41,774 org.apache.spark.scheduler.TaskSetManager: Lost task 4.0 in stage 2.0 (TID 11, 10.1.6.132): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:41,775 org.apache.spark.scheduler.TaskSetManager: Lost task 20.0 in stage 2.0 (TID 27, 10.1.6.132): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:41,787 org.apache.spark.scheduler.TaskSetManager: Lost task 41.0 in stage 2.0 (TID 48, 10.1.6.224): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:41,787 org.apache.spark.scheduler.TaskSetManager: Lost task 33.0 in stage 2.0 (TID 40, 10.1.6.224): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:41,788 org.apache.spark.scheduler.TaskSetManager: Lost task 45.0 in stage 2.0 (TID 53, 10.1.6.224): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:41,788 org.apache.spark.scheduler.TaskSetManager: Lost task 21.0 in stage 2.0 (TID 28, 10.1.6.224): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:41,788 org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in stage 2.0 (TID 8, 10.1.6.224): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:41,855 org.apache.spark.scheduler.TaskSetManager: Lost task 32.0 in stage 2.0 (TID 39, 10.1.6.132): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:42,073 org.apache.spark.scheduler.TaskSetManager: Lost task 24.0 in stage 2.0 (TID 31, 10.1.6.132): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:42,100 org.apache.spark.scheduler.TaskSetManager: Lost task 28.0 in stage 2.0 (TID 35, 10.1.6.132): TaskKilled (killed intentionally) WARN 2019-02-15 03:42:42,103 org.apache.spark.scheduler.TaskSetManager: Lost task 12.0 in stage 2.0 (TID 19, 10.1.6.132): TaskKilled (killed intentionally)