1.为ä»ä¹Sparkåå±ä¸å¦Hadoop
为ä»ä¹Sparkåå±ä¸å¦Hadoop
jarvis 源码Sparkæ¯ä¸ä¸ªåºäºRAM计ç®çå¼æºç ComputerClusterè¿ç®ç³»ç»ï¼ç®çæ¯æ´å¿«éå°è¿è¡æ°æ®åæãSparkæ©æçæ ¸å¿é¨å代ç åªæ3ä¸è¡ãSparkæä¾äºä¸HadoopMap/Reduceç¸ä¼¼çåæ£å¼è¿ç®æ¡æ¶ï¼ä½åºäºRAMåä¼å设计ï¼å æ¤å¨äº¤æ¢å¼æ°æ®åæådataminingçWorkloadä¸è¡¨ç°ä¸éã
jarvis 源码è¿å ¥å¹´ä»¥åï¼Sparkå¼æºç çæç³»ç»å¤§å¹ å¢é¿ï¼å·²æ为大æ°æ®èç´ææ´»è·çå¼æºç 项ç®ä¹ä¸ãSparkä¹æ以æå¦æ¤å¤çå ³æ³¨ï¼åå 主è¦æ¯å 为Sparkå ·æçé«æ§è½ãé«çµæ´»æ§ãä¸Hadoopçæç³»ç»å®ç¾èåçä¸æ¹é¢çç¹ç¹ã
jarvis 源码é¦å ï¼Spark对åæ£çæ°æ®éè¿è¡æ½æ ·ï¼åæ°å°æåºRDD(ResilientDistributedDataset)çæ¦å¿µï¼ææçç»è®¡åæä»»å¡è¢«ç¿»è¯æ对RDDçåºæ¬æä½ç»æçæåæ ç¯å¾(DAG)ãRDDå¯ä»¥è¢«é©»çå¨RAMä¸ï¼å¾åçä»»å¡å¯ä»¥ç´æ¥è¯»åRAMä¸çæ°æ®;åæ¶åæDAGä¸ä»»å¡ä¹é´çä¾èµæ§å¯ä»¥æç¸é»çä»»å¡å并ï¼ä»èåå°äºå¤§éä¸åç¡®çç»æè¾åºï¼æ大åå°äºHarddiskI/Oï¼ä½¿å¤ææ°æ®åæä»»å¡æ´é«æãä»è¿ä¸ªæ¨ç®ï¼å¦æä»»å¡å¤å¤æï¼Sparkæ¯Map/Reduceå¿«ä¸å°ä¸¤åã
jarvis 源码å ¶æ¬¡ï¼Sparkæ¯ä¸ä¸ªçµæ´»çè¿ç®æ¡æ¶ï¼éååæ¹æ¬¡å¤çãå·¥ä½æµã交äºå¼åæãæµéå¤ççä¸åç±»åçåºç¨ï¼å æ¤Sparkä¹å¯ä»¥æ为ä¸ä¸ªç¨é广æ³çè¿ç®å¼æï¼å¹¶å¨æªæ¥å代Map/Reduceçå°ä½ã
jarvis 源码æåï¼Sparkå¯ä»¥ä¸Hadoopçæç³»ç»çå¾å¤ç»ä»¶äºç¸æä½ãSparkå¯ä»¥è¿è¡å¨æ°ä¸ä»£èµæºç®¡çæ¡æ¶YARNä¸ï¼å®è¿å¯ä»¥è¯»åå·²æ并åæ¾å¨Hadoopä¸çæ°æ®ï¼è¿æ¯ä¸ªé常大çä¼å¿ã
jarvis 源码è½ç¶Sparkå ·æ以ä¸ä¸å¤§ä¼ç¹ï¼ä½ä»ç®åSparkçåå±ååºç¨ç°ç¶æ¥çï¼Sparkæ¬èº«ä¹åå¨å¾å¤ç¼ºé·ï¼ä¸»è¦å æ¬ä»¥ä¸å 个æ¹é¢ï¼
jarvis 源码â稳å®æ§æ¹é¢ï¼ç±äºä»£ç è´¨éé®é¢ï¼Sparké¿æ¶é´è¿è¡ä¼ç»å¸¸åºéï¼å¨æ¶ææ¹é¢ï¼ç±äºå¤§éæ°æ®è¢«ç¼åå¨RAMä¸ï¼Javaåæ¶åå¾ç¼æ ¢çæ åµä¸¥éï¼å¯¼è´Sparkæ§è½ä¸ç¨³å®ï¼å¨å¤æåºæ¯ä¸SQLçæ§è½çè³ä¸å¦ç°æçMap/Reduceã
jarvis 源码âä¸è½å¤ç大æ°æ®ï¼åç¬æºå¨å¤çæ°æ®è¿å¤§ï¼æè ç±äºæ°æ®åºç°é®é¢å¯¼è´ä¸é´ç»æè¶ è¿RAMç大å°æ¶ï¼å¸¸å¸¸åºç°RAM空é´ä¸è¶³ææ æ³å¾åºç»æãç¶èï¼Map/Reduceè¿ç®æ¡æ¶å¯ä»¥å¤ç大æ°æ®ï¼å¨è¿æ¹é¢ï¼Sparkä¸å¦Map/Reduceè¿ç®æ¡æ¶ææã
jarvis 源码âä¸è½æ¯æå¤æçSQLç»è®¡;ç®åSparkæ¯æçSQLè¯æ³å®æ´ç¨åº¦è¿ä¸è½åºç¨å¨å¤ææ°æ®åæä¸ãå¨å¯ç®¡çæ§æ¹é¢ï¼SparkYARNçç»åä¸å®åï¼è¿å°±ä¸ºä½¿ç¨è¿ç¨ä¸åä¸é忧ï¼å®¹æåºç°åç§é¾é¢ã
jarvis 源码è½ç¶Sparkæ´»è·å¨ClouderaãMapRãHortonworksçä¼å¤ç¥å大æ°æ®å ¬å¸ï¼ä½æ¯å¦æSparkæ¬èº«ç缺é·å¾ä¸å°åæ¶å¤çï¼å°ä¼ä¸¥éå½±åSparkçæ®åååå±ã
jarvis 源码