Coding Interview Club

Prepare for coding interviews in simple and structured steps

Hadoop Interview Questions On The Basics Of Hadoop And Hadoop Component Technologies

Though it says interview questions, this page list down questions that can be also used to test your understanding of BigData and Hadoop’s basics and about Hadoop’s component technologies that make up the Hadoop technology stack. This doesn’t go deeper into any of the technology stack component. Having a bigger picture and knowing how the components fit together will help you make decisions in using the right component in the right way.

These are questions without answers, but there will be hints along with most questions that may be used by interviewers to give additional inputs to the candidate and can be used by candidates to quickly get a hint about the question, like where a component might fit in the overall picture.   

  1. BigData and Hadoop Basics
    1. What is Big Data?
      1. Hint: Broad term for data sets so large or complex that traditional data processing applications are inadequate.
    2. What is Hadoop?
      1. Hint: HDFS + MapReduce + Libraries (HBase, Pig etc.).
    3. How is Google File System (GFS) related to Hadoop?
      1. Hint: Hadoop originally based on a whitepaper on GFS.
    4. Why Hadoop?
      1. Hint: Cheaper (commodity hardware), Faster (parallel processing).
    5. List few use cases where Hadoop can be used?
      1. Hint: Risk modeling, recommendation engine, Ad targeting, search engine quality.
    6. What are the core Hadoop components?
      1. Hint: HDFS and MapReduce.
    7. What are the differences between Hadoop 1.x and Hadoop 2.x?
      1. Hint: YARN
    8. What are the differences between RDBMS and Hadoop way of treating the data?
      1. RDBMS=schema on write, Hadoop=Schema on read.
    9. What are the disadvantages of using traditional relational databases for data analytics?
      1. Hint: Scalability, Speed etc.
    10. Many people compare RDBMS with Hadoop. Is Hadoop a database?
      1. Hint: Hadoop is a file system with processing store. May be used along with a database (mostly NoSQL database)
    11. Will RDBMS still be useful with the popularity of Hadoop?
      1. Hint: They solve a different problem.
    12. What are NoSQL Databases?
      1. Hint: Data that is modeled in means other than the tabular relations used in relational databases. no fixed columns.
    13. List few types of NoSQL databases with examples?
      1. Hint: key/value, columnstore, documentstore etc.
    14. What is a wide column store NoSQL Database?
      1. Hint: Width of column varies. E.g. HBase
    15. What do you know about CAP theorem?
      1. Hint: Consistency, Availability, Partition tolerance.
      2. Ref: https://en.wikipedia.org/wiki/CAP_theorem
    16. How and where do Hadoop fit in the CAP theorem?
      1. Hint: Scalability (Partitioning), Flexibility (Availability).
    17. What kinds of data are good fit for Hadoop?
      1. Hint: Behavioral Data.
    18. What kinds of data are not a good fit for Hadoop?
      1. Hint: Transactional data.
    19. Mostly Hadoop is used along with NoSQL databases. Can Hadoop be used with RDBMS? Explain.
    20. What are Hadoop’s alternative products or solutions?
      1. Hint: Disco, Filemap, Zillabyte etc.
    21. What the different distributions of Hadoop Available?
      1. Hint: open source (Apache Hadoop), commercial (Cloudera, HortonWorks, MapR), cloud (AWS with open source or commercial hadoop , Windows Azure HDInsight).
    22. What are the different hadoop solutions available from Cloudera?
      1. Hint: Cloudera Enterprise, Cloudera Live etc.
    23. What is Hue?
      1. Hint: GUI part of paid cloudera live distribution.
    24. What do you know about hadoop solutions available from HortonWorks?
      1. Hint: Windows and Linux versions, VMs with installations.
    25. What do you know about hadoop solutions available from MapR?
      1. Hint: NoSQL-DB file system, add ons to apache projects, sandboxes.
    26. What do you know about cloud initiatives based on Hadoop?
      1. Hint: AWS Elastic Map Reduce, Microsoft HDInsight.
    27. What do you mean by Hadoop incubator projects? Can you list anyone from it?
      1. Ref: http://incubator.apache.org/projects
    28. How do you compare Hadoop data processing with Grid Computing? 
  2. Hadoop Technology Stack
    1. What do you know about the below components (or libraries) and how are they related to Hadoop?
      1. HDFS
        1. Hint: Hadoop Distributed File System, part of hadoop core.
      2. MapReduce
        1. Hint: Programming model for processing data in Hadoop, part of hadoop core.
        2. Ref: https://en.wikipedia.org/wiki/MapReduce
      3. YARN
        1. Hint: Stands for Yet Another Resource Negotiator, Map Reduce v2, part of hadoop core.in Hadoop v2, cluster resource management system, allows any distributed program (not just MapReduce) to run on data in a Hadoop cluster.
      4. Hbase
        1. Hint: A key-value store, wide columnstore, NoSQL, uses HDFS for its underlying storage. 
      5. Hive
        1. Hint: HQL, Query language for HBase.
      6. Pig
        1. Hint: Scripting language
      7. Mahout
        1. Hint: Machine learning, predictive analysis.
      8. Oozie
        1. Hint: Workflow, coordination of jobs.
      9. Zookeeper
        1. Hint: Coordination
      10. Sqoop
        1. Hint: Data Exchange (RDBMS)
      11. Flume
        1. Hint: Log Collector.
      12. Ambari
        1. Hint: Managing Hadoop Clusters
      13. Cassandra
      14. Drill
      15. park
      16. Shark
      17. HCatalog
      18. Lucene
      19. Hama
      20. Crunch
      21. Avro
      22. Thrift
      23. Chukwa
    2. What are the differences between MapReduce 1 and YARN?
    3. What are the GUI tools available for managing hadoop HDFS, MapReduce and/or YARN? Have you used any?
    4. Can you run hadoop (map reduce) on regular file system without HDFS?
      1. Hint: Standalone.
    5. Can you run hadoop (map reduce) on cloud file system without HDFS?
      1. Hint: Amazon S3, Azure BLOB storage.

 

References

  • Wikipedia pages for all products listed here (if available).
  • CBT Nuggets Apache Hadoop
  • Lynda.com Hadoop Fundamentals
Category: 
Back to Top