海鸥吃什么食物| 三月生日是什么星座| 渐冻症是什么| 梦到丧尸是什么预兆| 作精是什么意思| 蛰居是什么意思| 肾结石吃什么| 石家庄以前叫什么名字| 为什么不建议小孩吃罗红霉素| 柠檬吃多了有什么坏处| 1989年属蛇是什么命| 无什么不什么的成语| 特种兵是什么兵种| 牙齿掉了一小块是什么原因| 鸡口牛后是什么生肖| 手起皮是什么原因| 咳嗽恶心干呕是什么原因引起的| 肺坠积性改变什么意思| 7月29日是什么星座| 妤是什么意思| 一切唯心造是什么意思| omega3是什么意思| 千里江陵是什么意思| 体检前一天要注意什么| 一什么眉毛填量词| 脉搏细是什么原因| 什么的天山| 叫爸爸是什么意思| 肝癌晚期什么症状| 有时头晕是什么原因| 什么时候恢复高考| 一什么大风| 喉咙嘶哑是什么原因| 黄疸高有什么危害| 羊蝎子是什么| 菩提心是什么意思| 时间h代表什么| MS医学上是什么意思| 血脂高是什么| 帛书是什么意思| 微醺是什么状态| 基质是什么| 3月有什么节日| 早晨醒来口苦是什么原因| 梦见韭菜是什么预兆| igg是什么| 太多的借口太多的理由是什么歌| 吃完饭恶心是什么原因| 喉咙嘶哑是什么原因| 眼睛一直眨是什么原因| 讳疾忌医是什么意思| 汉武帝叫什么名字| 丁是什么意思| 老夫聊发少年狂什么意思| 翡翠是什么颜色| 87岁属什么生肖| 上面一个日下面一个立是什么字| 吃饭掉筷子有什么预兆| 梦到被蛇咬是什么意思周公解梦| 结婚10周年是什么婚| 血钾查什么项目| 貔貅长什么样| 50岁用什么牌子化妆品好| 什么时间立秋| bni是什么意思| 为什么膝盖弯曲就疼痛| 生理期为什么会腰疼| 小姑子是什么关系| 餐巾纸属于什么垃圾| 小壁虎吃什么| 上焦有火吃什么中成药| 血压高吃什么菜和水果能降血压| 李自成为什么会失败| 头总出汗是什么原因| 得寸进尺是什么生肖| 孑孓什么意思| 韩愈字什么| 长颈鹿的脖子为什么那么长| 把妹什么意思| 湿疹是什么原因引起的| 跳大神什么意思| 八月十二是什么星座| 宝付支付是什么| 百香果的籽有什么功效| 什么是先兆流产| 血崩是什么症状| 细胞学说揭示了什么| 胃绞疼是什么原因| 劳士顿手表什么档次| 崇敬是什么意思| 白加黑是什么药| 虾仁和什么炒好吃| 直女是什么意思| 脸上长疣是什么原因| 尿正常是什么颜色| 肝内高回声什么意思| 女人缺铁性贫血吃什么好| 佝偻病是什么意思| 插入阴道什么感觉| 孙思邈发明了什么| 北京友谊医院擅长什么| 夏天盖什么被子最舒服| 什么是时装| b2c什么意思| 曹洪是曹操的什么人| 错构瘤是什么| 成龙真名叫什么名字| 眼睛一直跳是什么原因| 老年人适合吃什么| 早上起来后背疼是什么原因| 尿毒清颗粒主治什么病| 无味是什么意思| 什么是速率| 吃什么下奶快下奶多| 脚踏一星是什么命| 疯子是什么意思| 早上起来心慌是什么原因| 茶叶蛋用什么茶叶| 拉大便出血是什么原因| 白羊女跟什么星座最配| 心理是什么意思| 吃什么排便| 黑胡椒和白胡椒有什么区别| 经血逆流的症状是什么| ddp是什么化疗药| 风湿性关节炎什么症状| cvt是什么意思| 心脏缺血吃什么补得快| 鬼佬是什么意思| 红薯叶不能和什么一起吃| 己未五行属什么| 呆萌是什么意思| faye是什么意思| 同性恋是什么意思| 鲍鱼长什么样| 胃酸烧心吃什么| 阴虚火旺吃什么调理| 闻字五行属什么| 鸡眼和疣有什么区别| camel什么意思| 癫痫病是什么原因引起的| 煊是什么意思| 反常是什么意思| 为什么会长癣| 太燃了是什么意思| 双侧卵巢多囊样改变是什么意思| 皮肤软组织感染用什么消炎药| 蛇最怕什么药| 牙龈长期出血是什么原因| 孕妇感冒了对胎儿有什么影响| 菩提根是什么材质| 腋下副乳有什么危害吗| 孕妇尿酸高是什么原因| 氯雷他定为什么比西替利嗪贵| 阑尾炎术后吃什么| 合羽念什么| 牙缝越来越大是什么原因| 右耳朵痒是什么预兆| 扁桃体有什么作用| 右眼皮跳是什么预兆女| 五心烦热吃什么中成药| 缓释是什么意思| 什么东西只进不出| 苏慧伦为什么不老| 消化不良吃什么水果| 散射光是什么意思| 81年属鸡是什么命| 全国劳动模范有什么待遇| 膝盖疼痛吃什么药好| 冷冻和冷藏有什么区别| 纵欲是什么意思| 食道炎吃什么药好| 蚕豆是什么豆| 六堡茶属于什么茶| 干呕是什么原因| 舌苔黄厚是什么原因| 马来玉是什么玉| 什么叫黑科技| 家属是什么意思| bnp是什么检查| cd是什么意思啊| 医院属于什么单位| na是什么牌子| 九月什么花开| 术后吃什么刀口恢复得快| 为什么头晕| 什么是什么意思| 靶向药物是什么| 崩是什么意思| 华法林是什么药| 吃天麻对身体有什么好处| 有过之而不及是什么意思| 支气管挂什么科| 朱砂痣代表什么| 藏毛窦是什么病| ca什么意思| 邮箱地址填什么| 婴儿便便是绿色的是什么原因| 小孩用脚尖走路是什么原因| 尿频吃什么药效果最好| 殳是什么意思| 日皮是什么意思| showroom是什么意思| 小孩突然头疼是什么原因| 什么叫更年期| 毛豆吃多了有什么坏处| 丝鸟读什么| 什么是次数| 性功能障碍吃什么药| 大红袍属于什么茶| 膀胱过度活动症吃什么药| 胃胀胃痛吃什么药| 总爱放屁是什么原因| 孕育是什么意思| 中耳炎吃什么| 觉悟是什么意思| tt是什么| 脸无缘无故的肿是什么原因| 提心吊胆是什么生肖| 脑萎缩吃什么药能控制| 晚饭吃什么减肥| 胰腺炎能吃什么| 气蛋是什么病| 此生不换什么意思| 甲状腺是挂什么科| 鸭屎香是什么茶| 跑步对身体有什么好处| 四维空间是什么样子| 人身体缺钾是什么症状| 五月二十三日是什么星座| 生日送百合花代表什么| gfr医学上是什么意思| p4是什么意思| 高血压变成低血压是什么原因| 吃什么能快速减肥| 苹果补充什么维生素| freeze是什么意思| 梦见自己流鼻血是什么预兆| 肾虚吃什么药最有效| 子宫b超能查出什么来| 心度高血压是什么意思| 西红柿不能跟什么一起吃| 更年期是什么| yet是什么意思| 梦到自己老公出轨是什么意思| 什么叫惊喜| 鱼漂什么牌子的好| 快速补血吃什么| 下午5点到7点是什么时辰| 午火是什么火| 十个一是什么| 脂肪肝吃什么药好得快| land rover是什么车| 树懒是什么动物| 脑补是什么意思| 濒危是什么意思| 晚上一直做梦是什么原因引起的| 圣经是什么| 窦性心律不齐是什么情况| 吃杨梅有什么好处和功效| 素饺子什么馅儿的好吃| 林冲的绰号是什么| 跖疣是什么东西| 黄瓜为什么苦| 百度Jump to content

《女神异闻录5》IGN 9.7分 日式RPG游戏新标杆

From Wikipedia, the free encyclopedia
Apache Spark
Original author(s)Matei Zaharia
Developer(s)Apache Spark
Initial releaseMay 26, 2014; 11 years ago (2025-08-07)
Stable release
4.0.0 (Scala 2.13) / May 23, 2025; 2 months ago (2025-08-07)
RepositorySpark Repository
Written inScala[1]
Operating systemWindows, macOS, Linux
Available inScala, Java, SQL, Python, R, C#, F#
TypeData analytics, machine learning algorithms
LicenseApache License 2.0
Websitespark.apache.org Edit this at Wikidata
百度 对于认罪认罚案件,检察机关审查起诉平均用时26天,人民法院15日内审结的占%。

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab starting in 2009, in 2013, the Spark codebase was donated to the Apache Software Foundation, which has maintained it since.

Overview

[edit]

Apache Spark has its architectural foundation in the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way.[2] The Dataframe API was released as an abstraction on top of the RDD, followed by the Dataset API. In Spark 1.x, the RDD was the primary application programming interface (API), but as of Spark 2.x use of the Dataset API is encouraged[3] even though the RDD API is not deprecated.[4][5] The RDD technology still underlies the Dataset API.[6][7]

Spark and its RDDs were developed in 2012 in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflow structure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory.[8]

Inside Apache Spark the workflow is managed as a directed acyclic graph (DAG). Nodes represent RDDs while edges represent the operations on the RDDs.

Spark facilitates the implementation of both iterative algorithms, which visit their data set multiple times in a loop, and interactive/exploratory data analysis, i.e., the repeated database-style querying of data. The latency of such applications may be reduced by several orders of magnitude compared to Apache Hadoop MapReduce implementation.[2][9] Among the class of iterative algorithms are the training algorithms for machine learning systems, which formed the initial impetus for developing Apache Spark.[10]

Apache Spark requires a cluster manager and a distributed storage system. For cluster management, Spark supports standalone native Spark, Hadoop YARN, Apache Mesos or Kubernetes.[11] A standalone native Spark cluster can be launched manually or by the launch scripts provided by the install package. It is also possible to run the daemons on a single machine for testing. For distributed storage Spark can interface with a wide variety of distributed systems, including Alluxio, Hadoop Distributed File System (HDFS),[12] MapR File System (MapR-FS),[13] Cassandra,[14] OpenStack Swift, Amazon S3, Kudu, Lustre file system,[15] or a custom solution can be implemented. Spark also supports a pseudo-distributed local mode, usually used only for development or testing purposes, where distributed storage is not required and the local file system can be used instead; in such a scenario, Spark is run on a single machine with one executor per CPU core.

Spark Core

[edit]

Spark Core is the foundation of the overall project. It provides distributed task dispatching, scheduling, and basic I/O functionalities, exposed through an application programming interface (for Java, Python, Scala, .NET[16] and R) centered on the RDD abstraction (the Java API is available for other JVM languages, but is also usable for some other non-JVM languages that can connect to the JVM, such as Julia[17]). This interface mirrors a functional/higher-order model of programming: a "driver" program invokes parallel operations such as map, filter or reduce on an RDD by passing a function to Spark, which then schedules the function's execution in parallel on the cluster.[2] These operations, and additional ones such as joins, take RDDs as input and produce new RDDs. RDDs are immutable and their operations are lazy; fault-tolerance is achieved by keeping track of the "lineage" of each RDD (the sequence of operations that produced it) so that it can be reconstructed in the case of data loss. RDDs can contain any type of Python, .NET, Java, or Scala objects.

Besides the RDD-oriented functional style of programming, Spark provides two restricted forms of shared variables: broadcast variables reference read-only data that needs to be available on all nodes, while accumulators can be used to program reductions in an imperative style.[2]

A typical example of RDD-centric functional programming is the following Scala program that computes the frequencies of all words occurring in a set of text files and prints the most common ones. Each map, flatMap (a variant of map) and reduceByKey takes an anonymous function that performs a simple operation on a single data item (or a pair of items), and applies its argument to transform an RDD into a new RDD.

val conf = new SparkConf().setAppName("wiki_test") // create a spark config object
val sc = new SparkContext(conf) // Create a spark context
val data = sc.textFile("/path/to/somedir") // Read files from "somedir" into an RDD of (filename, content) pairs.
val tokens = data.flatMap(_.split(" ")) // Split each file into a list of tokens (words).
val wordFreq = tokens.map((_, 1)).reduceByKey(_ + _) // Add a count of one to each token, then sum the counts per word type.
wordFreq.sortBy(s => -s._2).map(x => (x._2, x._1)).top(10) // Get the top 10 words. Swap word and count to sort by count.

Spark SQL

[edit]

Spark SQL is a component on top of Spark Core that introduced a data abstraction called DataFrames,[a] which provides support for structured and semi-structured data. Spark SQL provides a domain-specific language (DSL) to manipulate DataFrames in Scala, Java, Python or .NET.[16] It also provides SQL language support, with command-line interfaces and ODBC/JDBC server. Although DataFrames lack the compile-time type-checking afforded by RDDs, as of Spark 2.0, the strongly typed DataSet is fully supported by Spark SQL as well.

import org.apache.spark.sql.SparkSession

val url = "jdbc:mysql://yourIP:yourPort/test?user=yourUsername;password=yourPassword" // URL for your database server.
val spark = SparkSession.builder().getOrCreate() // Create a Spark session object

val df = spark
  .read
  .format("jdbc")
  .option("url", url)
  .option("dbtable", "people")
  .load()

df.printSchema() // Looks at the schema of this DataFrame.
val countsByAge = df.groupBy("age").count() // Counts people by age

Or alternatively via SQL:

df.createOrReplaceTempView("people")
val countsByAge = spark.sql("SELECT age, count(*) FROM people GROUP BY age")

Spark Streaming

[edit]

Spark Streaming uses Spark Core's fast scheduling capability to perform streaming analytics. It ingests data in mini-batches and performs RDD transformations on those mini-batches of data. This design enables the same set of application code written for batch analytics to be used in streaming analytics, thus facilitating easy implementation of lambda architecture.[19][20] However, this convenience comes with the penalty of latency equal to the mini-batch duration. Other streaming data engines that process event by event rather than in mini-batches include Storm and the streaming component of Flink.[21] Spark Streaming has support built-in to consume from Kafka, Flume, Twitter, ZeroMQ, Kinesis, and TCP/IP sockets.[22]

In Spark 2.x, a separate technology based on Datasets, called Structured Streaming, that has a higher-level interface is also provided to support streaming.[23]

Spark can be deployed in a traditional on-premises data center as well as in the cloud.[24]

MLlib machine learning library

[edit]

Spark MLlib is a distributed machine-learning framework on top of Spark Core that, due in large part to the distributed memory-based Spark architecture, is as much as nine times as fast as the disk-based implementation used by Apache Mahout (according to benchmarks done by the MLlib developers against the alternating least squares (ALS) implementations, and before Mahout itself gained a Spark interface), and scales better than Vowpal Wabbit.[25] Many common machine learning and statistical algorithms have been implemented and are shipped with MLlib which simplifies large scale machine learning pipelines, including:

GraphX

[edit]

GraphX is a distributed graph-processing framework on top of Apache Spark. Because it is based on RDDs, which are immutable, graphs are immutable and thus GraphX is unsuitable for graphs that need to be updated, let alone in a transactional manner like a graph database.[27] GraphX provides two separate APIs for implementation of massively parallel algorithms (such as PageRank): a Pregel abstraction, and a more general MapReduce-style API.[28] Unlike its predecessor Bagel, which was formally deprecated in Spark 1.6, GraphX has full support for property graphs (graphs where properties can be attached to edges and vertices).[29]

Like Apache Spark, GraphX initially started as a research project at UC Berkeley's AMPLab and Databricks, and was later donated to the Apache Software Foundation and the Spark project.[30]

Language support

[edit]

Apache Spark has built-in support for Scala, Java, SQL, R, and Python with 3rd party support for the .NET CLR,[31] Julia,[32] and more.

History

[edit]

Spark was initially started by Matei Zaharia at UC Berkeley's AMPLab in 2009, and open sourced in 2010 under a BSD license.[33]

In 2013, the project was donated to the Apache Software Foundation and switched its license to Apache 2.0. In February 2014, Spark became a Top-Level Apache Project.[34]

In November 2014, Spark founder M. Zaharia's company Databricks set a new world record in large scale sorting using Spark.[35][33]

Spark had in excess of 1000 contributors in 2015,[36] making it one of the most active projects in the Apache Software Foundation[37] and one of the most active open source big data projects.

Version Original release date Latest version Release date
Unsupported: 0.5 2025-08-07 0.5.2 2025-08-07
Unsupported: 0.6 2025-08-07 0.6.2 2025-08-07
Unsupported: 0.7 2025-08-07 0.7.3 2025-08-07
Unsupported: 0.8 2025-08-07 0.8.1 2025-08-07
Unsupported: 0.9 2025-08-07 0.9.2 2025-08-07
Unsupported: 1.0 2025-08-07 1.0.2 2025-08-07
Unsupported: 1.1 2025-08-07 1.1.1 2025-08-07
Unsupported: 1.2 2025-08-07 1.2.2 2025-08-07
Unsupported: 1.3 2025-08-07 1.3.1 2025-08-07
Unsupported: 1.4 2025-08-07 1.4.1 2025-08-07
Unsupported: 1.5 2025-08-07 1.5.2 2025-08-07
Unsupported: 1.6 2025-08-07 1.6.3 2025-08-07
Unsupported: 2.0 2025-08-07 2.0.2 2025-08-07
Unsupported: 2.1 2025-08-07 2.1.3 2025-08-07
Unsupported: 2.2 2025-08-07 2.2.3 2025-08-07
Unsupported: 2.3 2025-08-07 2.3.4 2025-08-07
Unsupported: 2.4 LTS 2025-08-07 2.4.8 2025-08-07[38]
Unsupported: 3.0 2025-08-07 3.0.3 2025-08-07[39]
Unsupported: 3.1 2025-08-07 3.1.3 2025-08-07[40]
Unsupported: 3.2 2025-08-07 3.2.4 2025-08-07[41]
Supported: 3.3 2025-08-07 3.3.3 2025-08-07[42]
Supported: 3.4 2025-08-07 3.4.3 2025-08-07[43]
Latest version: 3.5 2025-08-07 3.5.2 2025-08-07[44]
Legend:
Unsupported
Supported
Latest version
Preview version

Scala version

[edit]

Spark 3.5.2 is based on Scala 2.13 (and thus works with Scala 2.12 and 2.13 out-of-the-box), but it can also be made to work with Scala 3.[45]

Developers

[edit]

Apache Spark is developed by a community. The project is managed by a group called the "Project Management Committee" (PMC).[46]

Maintenance releases and EOL

[edit]

Feature release branches will, generally, be maintained with bug fix releases for a period of 18 months. For example, branch 2.3.x is no longer considered maintained as of September 2019, 18 months after the release of 2.3.0 in February 2018. No more 2.3.x releases should be expected after that point, even for bug fixes.

The last minor release within a major a release will typically be maintained for longer as an “LTS” release. For example, 2.4.0 was released on November 2, 2018, and had been maintained for 31 months until 2.4.8 was released in May 2021. 2.4.8 is the last release and no more 2.4.x releases should be expected even for bug fixes.[47]

See also

[edit]

Notes

[edit]
  1. ^ Called SchemaRDDs before Spark 1.3[18]

References

[edit]
  1. ^ "Spark Release 2.0.0". MLlib in R: SparkR now offers MLlib APIs [..] Python: PySpark now offers many more MLlib algorithms"
  2. ^ a b c d Zaharia, Matei; Chowdhury, Mosharaf; Franklin, Michael J.; Shenker, Scott; Stoica, Ion. Spark: Cluster Computing with Working Sets (PDF). USENIX Workshop on Hot Topics in Cloud Computing (HotCloud).
  3. ^ "Spark 2.2.0 Quick Start". apache.org. 2025-08-07. Retrieved 2025-08-07. we highly recommend you to switch to use Dataset, which has better performance than RDD
  4. ^ "Spark 2.2.0 deprecation list". apache.org. 2025-08-07. Retrieved 2025-08-07.
  5. ^ Damji, Jules (2025-08-07). "A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets: When to use them and why". databricks.com. Retrieved 2025-08-07.
  6. ^ Chambers, Bill (2025-08-07). "12". Spark: The Definitive Guide. O'Reilly Media. virtually all Spark code you run, where DataFrames or Datasets, compiles down to an RDD[permanent dead link]
  7. ^ "What is Apache Spark? Spark Tutorial Guide for Beginner". janbasktraining.com. 2025-08-07. Retrieved 2025-08-07.
  8. ^ Zaharia, Matei; Chowdhury, Mosharaf; Das, Tathagata; Dave, Ankur; Ma, Justin; McCauley, Murphy; J., Michael; Shenker, Scott; Stoica, Ion (2010). Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing (PDF). USENIX Symp. Networked Systems Design and Implementation.
  9. ^ Xin, Reynold; Rosen, Josh; Zaharia, Matei; Franklin, Michael; Shenker, Scott; Stoica, Ion (June 2013). Shark: SQL and Rich Analytics at Scale (PDF). SIGMOD 2013. arXiv:1211.6176. Bibcode:2012arXiv1211.6176X.
  10. ^ Harris, Derrick (28 June 2014). "4 reasons why Spark could jolt Hadoop into hyperdrive". Gigaom. Archived from the original on 24 October 2017. Retrieved 25 February 2016.
  11. ^ "Cluster Mode Overview - Spark 2.4.0 Documentation - Cluster Manager Types". apache.org. Apache Foundation. 2025-08-07. Retrieved 2025-08-07.
  12. ^ Figure showing Spark in relation to other open-source Software projects including Hadoop
  13. ^ MapR ecosystem support matrix
  14. ^ Doan, DuyHai (2025-08-07). "Re: cassandra + spark / pyspark". Cassandra User (Mailing list). Retrieved 2025-08-07.
  15. ^ Wang, Yandong; Goldstone, Robin; Yu, Weikuan; Wang, Teng (May 2014). "Characterization and Optimization of Memory-Resident MapReduce on HPC Systems". 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE. pp. 799–808. doi:10.1109/IPDPS.2014.87. ISBN 978-1-4799-3800-1. S2CID 11157612.
  16. ^ a b dotnet/spark, .NET Platform, 2025-08-07, retrieved 2025-08-07
  17. ^ "GitHub - DFDX/Spark.jl: Julia binding for Apache Spark". GitHub. 2025-08-07.
  18. ^ "Spark Release 1.3.0 | Apache Spark".
  19. ^ "Applying the Lambda Architecture with Spark, Kafka, and Cassandra | Pluralsight". www.pluralsight.com. Retrieved 2025-08-07.
  20. ^ Shapira, Gwen (29 August 2014). "Building Lambda Architecture with Spark Streaming". cloudera.com. Cloudera. Archived from the original on 14 June 2016. Retrieved 17 June 2016. re-use the same aggregates we wrote for our batch application on a real-time data stream
  21. ^ Chintapalli, Sanket; Dagit, Derek; Evans, Bobby; Farivar, Reza; Graves, Thomas; Holderbaugh, Mark; Liu, Zhuo; Nusbaum, Kyle; Patil, Kishorkumar; Peng, Boyang Jerry; Poulosky, Paul (May 2016). "Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming". 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). IEEE. pp. 1789–1792. doi:10.1109/IPDPSW.2016.138. ISBN 978-1-5090-3682-0. S2CID 2180634.
  22. ^ Kharbanda, Arush (17 March 2015). "Getting Data into Spark Streaming". sigmoid.com. Sigmoid (Sunnyvale, California IT product company). Archived from the original on 15 August 2016. Retrieved 7 July 2016.
  23. ^ Zaharia, Matei (2025-08-07). "Structured Streaming In Apache Spark: A new high-level API for streaming". databricks.com. Retrieved 2025-08-07.
  24. ^ "On-Premises vs. Cloud Data Warehouses: Pros and Cons". SearchDataManagement. Retrieved 2025-08-07.
  25. ^ Sparks, Evan; Talwalkar, Ameet (2025-08-07). "Spark Meetup: MLbase, Distributed Machine Learning with Spark". slideshare.net. Spark User Meetup, San Francisco, California. Retrieved 10 February 2014.
  26. ^ "MLlib | Apache Spark". spark.apache.org. Retrieved 2025-08-07.
  27. ^ Malak, Michael (14 June 2016). "Finding Graph Isomorphisms In GraphX And GraphFrames: Graph Processing vs. Graph Database". slideshare.net. sparksummit.org. Retrieved 11 July 2016.
  28. ^ Malak, Michael (1 July 2016). Spark GraphX in Action. Manning. p. 89. ISBN 9781617292521. Pregel and its little sibling aggregateMessages() are the cornerstones of graph processing in GraphX. ... algorithms that require more flexibility for the terminating condition have to be implemented using aggregateMessages()
  29. ^ Malak, Michael (14 June 2016). "Finding Graph Isomorphisms In GraphX And GraphFrames: Graph Processing vs. Graph Database". slideshare.net. sparksummit.org. Retrieved 11 July 2016.
  30. ^ Gonzalez, Joseph; Xin, Reynold; Dave, Ankur; Crankshaw, Daniel; Franklin, Michael; Stoica, Ion (Oct 2014). GraphX: Graph Processing in a Distributed Dataflow Framework (PDF). OSDI 2014.
  31. ^ ".NET for Apache Spark | Big data analytics". 15 October 2019.
  32. ^ "Spark.jl". GitHub. 14 October 2021.
  33. ^ a b Clark, Lindsay. "Apache Spark speeds up big data decision-making". ComputerWeekly.com. Retrieved 2025-08-07.
  34. ^ "The Apache Software Foundation Announces Apache&#8482 Spark&#8482 as a Top-Level Project". apache.org. Apache Software Foundation. 27 February 2014. Retrieved 4 March 2014.
  35. ^ Spark officially sets a new record in large-scale sorting
  36. ^ Open HUB Spark development activity
  37. ^ "The Apache Software Foundation Announces Apache&#8482 Spark&#8482 as a Top-Level Project". apache.org. Apache Software Foundation. 27 February 2014. Retrieved 4 March 2014.
  38. ^ "Spark 2.4.8 released". spark.apache.org. Archived from the original on 2025-08-07.
  39. ^ "Spark 3.0.3 released". spark.apache.org.
  40. ^ "Spark 3.1.3 released". spark.apache.org. Archived from the original on 2025-08-07.
  41. ^ "Spark 3.2.4 released". spark.apache.org.
  42. ^ "Spark 3.3.3 released". spark.apache.org.
  43. ^ "Spark 3.4.3 released". spark.apache.org.
  44. ^ "Spark 3.5.2 released". spark.apache.org.
  45. ^ "Using Scala 3 with Spark". 47 Degrees. Retrieved 29 July 2022.
  46. ^ "Apache Committee Information".
  47. ^ "Versioning policy". spark.apache.org.
[edit]
贵州菜属于什么菜系 起飞是什么意思 马来西亚信仰什么教 提手旁加茶念什么 恒字属于五行属什么
指甲上有白点是什么原因 蜈蚣咬了擦什么药最好 什么花可以吃 9月12号是什么星座 恶露是什么意思
顶臀径是指什么 跛子是什么意思 骨密度是查什么的 什么茶提神 o型血父母是什么血型
儿童拉肚子吃什么药 胃轻度肠化是什么意思 什么是横纹肌溶解 57年属什么生肖 肛检是检查什么
关税什么意思hcv8jop4ns7r.cn 肚子上面疼是什么原因hcv8jop0ns6r.cn 33朵玫瑰花代表什么hcv7jop5ns4r.cn 什么是辐射hcv7jop5ns5r.cn 小太阳是什么意思hcv8jop9ns2r.cn
水星是什么颜色的wzqsfys.com 千与千寻是什么意思hcv9jop7ns1r.cn 斯密达什么意思hcv9jop8ns1r.cn 子宫肌瘤是什么原因造成的hcv9jop4ns9r.cn 夏对什么hcv9jop4ns1r.cn
胆囊炎吃什么药hcv9jop0ns4r.cn 士官是什么huizhijixie.com 绕行是什么意思hcv7jop6ns6r.cn 小孩急性肠胃炎吃什么药hcv8jop4ns6r.cn 乳头瘙痒是什么原因hcv9jop3ns4r.cn
什么是标准预防hcv8jop2ns3r.cn leu是什么氨基酸hcv8jop7ns8r.cn 梦见吃螃蟹是什么预兆0735v.com 笔仙是什么hlguo.com 双肾小结石是什么意思hcv7jop6ns7r.cn
百度