As big data is becoming a common use case and we need distributed systems that can take advantage of parallel processing.
I will be talking about Hadoop, Map Reduce in general and how to run Hadoop Map Reduce jobs using Amazon EMR service. Will be sharing insights from managing hyper scale production Hadoop clusters and tuning for performance in general – Think 68400 GB RAM, 26000 CPUs and 1700000 GB Disks 🙂
Session Details :
1. Big Data and Use cases
2. Hadoop and Map Reduce – Streaming and Custom Jar jobs
3. Local hadoop setup for development
4. Launching Hadoop cluster using EMR
5. Profiling and Performance tuning
Drop a comment here if you want me to discuss any other specific topics as part of this session.
This post was submitted by amnigos.