前沿拓展:
不兼容的應用程序
以電腦為例,不兼容的應用程序解決的方法:
1、第一,拿任意軟件做例子,右鍵進入其軟件的屬性。
2、將彈出的屬性窗口切換至兼容性面板。
2009年,計算機用戶數(shù)量從原來的630萬增長至6710萬臺,聯(lián)網計算機臺數(shù)由原來的5940序義聯(lián)長字父轉汽權萬臺上升至2.9億臺?;ヂ?lián)網用戶已經達到3.16才當提重億,**互聯(lián)網有6.7億移動用戶,其中手機上網用戶達1.17億,為全球第一位。
用戶反饋自己寫的spark程序放到YANR執(zhí)行偶爾報錯,錯誤提示java.lang.IllegalArgumentException: Illegal pattern component: XXX。從日志來看是創(chuàng)建FastDateFormat對象報錯,并且偶爾報錯,難道和執(zhí)行的數(shù)據有關,用戶反饋相同數(shù)據也是有時成功有時失??;報錯都發(fā)生在集群固定節(jié)點嗎,這個也不明確,先仔細看看錯誤日志再說。
使用vim查看spark日志,也可使用yarn logs -applicationId $appId查看日志
23/02/08 10:15:06 ERROR Executor: Exception in task 5.3 in stage 0.0 (TID 4)
java.lang.IllegalArgumentException: Illegal pattern component: XXX
at org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282)
at org.apache.commons.lang3.time.FastDatePrinter.init(FastDatePrinter.java:149)
at org.apache.commons.lang3.time.FastDatePrinter.<init>(FastDatePrinter.java:142)
at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:384)
at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:369)
at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:91)
at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:88)
at org.apache.commons.lang3.time.FormatCache.getInstance(FormatCache.java:82)
at org.apache.commons.lang3.time.FastDateFormat.getInstance(FastDateFormat.java:165)
at org.apache.spark.sql.catalyst.json.JSONOptions.<init>(JSONOptions.scala:83)
at org.apache.spark.sql.catalyst.json.JSONOptions.<init>(JSONOptions.scala:43)
at org.apache.spark.sql.Dataset$anonfun$toJSON$1.apply(Dataset.scala:3146)
at org.apache.spark.sql.Dataset$anonfun$toJSON$1.apply(Dataset.scala:3142)
at org.apache.spark.sql.execution.MapPartitionsExec$anonfun$5.apply(objects.scala:188)
at org.apache.spark.sql.execution.MapPartitionsExec$anonfun$5.apply(objects.scala:185)
at org.apache.spark.rdd.RDD$anonfun$mapPartitionsInternal$1$anonfun$apply$25.apply(RDD.scala:836)
at org.apache.spark.rdd.RDD$anonfun$mapPartitionsInternal$1$anonfun$apply$25.apply(RDD.scala:836)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
根據日志錯誤是從spark catalyst出來的,即org.apache.spark.sql.catalyst.json.JSONOptions類的83行,創(chuàng)建FastDateFormat實例的過程發(fā)生異常。查看spark JSONOptions類源碼,我們發(fā)現(xiàn)報錯行入參處恰好有字符XXX,是否和這兒有關繼續(xù)看創(chuàng)建創(chuàng)建FastDateFormat過程,跟蹤源碼到commons-lang3的org.apache.commons.lang3.time.FastDatePrinter類
//Spark源碼
val timestampFormat: FastDateFormat =
FastDateFormat.getInstance(
parameters.getOrElse("timestampFormat", "yyyy-MM-dd'T'HH:mm:ss.SSSXXX"), timeZone, Locale.US)
找到spark源碼引用的commons-lang3-3.8.1.jar包,進入在FastDatePrinter類搜索關鍵詞Illegal pattern component,關鍵詞Illegal pattern component找到了,僅此一處,但仔細一看行號對不上,說明spark加載的FastDateFormat并不是3.8.1版本,那從哪兒來的呢
protected List<Rule> parsePattern() {
//。。。省略
case 'M': // month in year (text and number)
if (tokenLen >= 4) {
rule = new TextField(Calendar.MONTH, months);
} else if (tokenLen == 3) {
rule = new TextField(Calendar.MONTH, shortMonths);
} else if (tokenLen == 2) {
rule = TwoDigitMonthField.INSTANCE;
} else {
rule = UnpaddedMonthField.INSTANCE;
}
break;
case 'd': // day in month (number)
rule = selectNumberRule(Calendar.DAY_OF_MONTH, tokenLen);
break;
//。。。省略
default:
throw new IllegalArgumentException("Illegal pattern component: " + token);
}
rules.add(rule);
}
return rules;
}
第一看用戶代碼是否引入了commons-lang3,解壓fat-jar后確實存在org.apache.commons.lang3.time.FastDateFormat,但版本也是3.8.1,不會這兒引起的。查找spark安裝目錄,執(zhí)行命令find . -name "commons-lang3*"進行查找,只有一個commons-lang3-3.8.1.jar在${SPARK_HOME}/jars/目錄下,再查找hadoop目錄,發(fā)現(xiàn)了有commons-lang3,但下載下來反編譯后,檢查行號仍然對不上,說明也不是加載的這個jar,奇怪為啥沒找到哪兒來的類,想嘗試用arthas查看但是不確定executor會啟動在哪個節(jié)點,且沒多久就報錯結束了。
嘗試添加JVM參數(shù)打印spark driver/executor加載的類,從日志看spark driver加載了正確的包commons-lang3-3.8.1.jar,spark executor沒找到想要日志,可能verbose輸出的時候還沒用到FastDateFormat類?下面是spark-submit添加JVM參數(shù)的方式:
spark-submit \
–master yarn \
–driver-memory 4G \
–name 'AppName' \
–conf 'spark.driver.extraJavaOptions=-verbose:class' \
–conf 'spark.executor.extraJavaOptions=-verbose:class' \
用戶催的急沒時間了,改代碼看看FastDateFormat哪兒來的,順便在發(fā)生異常的情況下重新創(chuàng)建FastDateFormat對象,去掉參數(shù)XXX,這樣避免因為不支持XXX參數(shù)而導致程序失敗退出。
// 修改org.apache.spark.sql.catalyst.json.JSONOptions源碼,捕獲timestampFormat實例化的異常
// 捕獲異常,打印Message和類加載來源
logWarning("==============>>>" + e.getMessage)
val clazz = FastDateFormat.getInstance().getClass
val location = clazz.getResource('/' + clazz.getName.replace('.', '/') + ".class")
logWarning("resource location: " + location.toString)
打包編譯并替換spark對應jar包,運行成功了且捕獲到了異常信息,發(fā)現(xiàn)FastDateFormat來源hive-exec-1.2.1.spark2.jar,才想起忘了搜索jar包里的class類,除了包名commons-lang3*,還需要查找jar里的內容,hive-exec剛好是個fat-jar
23/02/08 17:12:39 WARN JSONOptions: ==============>>>Illegal pattern component: XXX
23/02/08 17:12:39 WARN JSONOptions: resource location: jar:file:/data/hadoop/yarn/local/usercache/…/__spark_libs__1238265929018908261.zip/hive-exec-1.2.1.spark2.jar!/org/apache/commons/lang3/time/FastDateFormat.class
spark的commons-lang3-3.8.1.jar和hive-exec-1.2.1.spark2.jar都在目錄${SPARK_HOME}/jar/下,百度了下這種情況下類加載順序和CLASSPATH填寫的JAR順序以及創(chuàng)建時間都相關(沒找到一個權威的文章),期間嘗試過設置spark-submit參數(shù):–conf 'spark.executor.extraClassPath=commons-lang3-3.8.1.jar'來優(yōu)先加載這個jar,仍然執(zhí)行報錯,后來看日志的時候偶然發(fā)現(xiàn) CLASSPATH的路徑時錯誤的,如下日志,看來spark.executor.extraClassPath的作用就是把jar包放到classpath的最前面,這樣達到優(yōu)先加載的目的。
export SPARK_YARN_STAGING_DIR="hdfs://nnHA/user/p55_u34_tsp_caihong/.sparkStaging/application_1670726876109_157924"
export CLASSPATH="commons-lang3-3.8.1.jar:$PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:/etc/hadoop/conf:/usr/lib/hadoop/libs/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*:$PWD/__spark_conf__/__hadoop_conf__"
spark.executor.extraClassPath必須使用全路徑,嘗試修改spark-submit命令,寫全spark.executor.extraClassPath路徑,重新執(zhí)行問題解決,沒在報錯。但是仍有疑問:為啥偶爾成功失敗,按理說兩個jar包雖然都在jars目錄,也會有固定的加載順序吧,不解。。。了解的同學還請在評論區(qū)告之,不勝感激。修改源碼和設置spark.executor.extraClassPath參數(shù)都能有效解決這個問題。
spark-submit \
–master yarn \
–driver-memory 4G \
–name 'AppName' \
–conf 'spark.driver.extraJavaOptions=-verbose:class' \
–conf 'spark.executor.extraJavaOptions=-verbose:class' \
–conf 'spark.executor.extraClassPath=${SPARK_HOME}/jars/commons-lang3-3.8.1.jar' \
拓展知識:
原創(chuàng)文章,作者:九賢生活小編,如若轉載,請注明出處:http:///43306.html