What is the difference between a Spark broadcast variable and an accumulator?
Share
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Please Disable the AdBlocker to Continue to the site.
A Spark broadcast variable is said to be a read-only variable that is cached on each worker node in a Spark cluster. Broadcast variables are used to share data across tasks efficiently and avoid unnecessary data shuffling.
An accumulator, on the other hand, is a write-only variable that is used to accumulate values across tasks. Accumulators are used for tasks like counting the number of records processed or summing up values across partitions. Unlike broadcast variables, accumulators are not cached on worker nodes but are only accessible by the driver program.