A Medley of Spark Tips and Tricks
After spending countless hours working with Spark, I’ve compiled some tips and tricks that have helped me improve my productivity and performance. Today, I’ll share some of my favorite ones with you. 1. Measuring DataFrame Size in Memory When working with Spark, knowing how much memory your DataFrame uses is crucial for optimization. This is especially important since compressed formats like Parquet can make memory usage unpredictable - a small file on disk might expand significantly when loaded into memory....