What is the difference between cache() and persist() in Spark?
Share
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Please Disable the AdBlocker to Continue to the site.
Cache() and persist() are used to store intermediate RDDs in memory for quicker access. The main difference is that cache() exclusively stores RDDs in memory, whereas persist() allows users to choose between memory, a disk, or a combination of both.