Sunday, 20 October 2024

HDFS Block abstraction :

 

HDFS Block abstraction :

 

  • HDFS block size is usually 64MB-128MB and unlike other filesystems, a file smaller than the block size does not occupy the complete block size’s worth of memory.
  • The block size is kept so large so that less time is made doing disk seeks as compared to the data transfer rate.
  • Why do we need block abstraction :
  • Files can be bigger than individual disks.
  • Filesystem metadata does not need to be associated with each and every block.
  • Simplifies storage management - Easy to figure out the number of blocks which can be stored on each disk.
  • Fault tolerance and storage replication can be easily done on a per-block basis.

 

Data Replication :

  • Replication ensures the availability of the data. 
  • Replication is - making a copy of something and the

number of times you make a copy of that particular thing can be expressed as its Replication Factor. 

  • As HDFS stores the data in the form of various blocks at the same time Hadoop is also configured to make a copy of those file blocks. 
  • By default, the Replication Factor for Hadoop is set to 3 which can be configured.
  • We need this replication for our file blocks because for running Hadoop we are using commodity hardware (inexpensive system hardware) which can be crashed at any time.
  • We are not using a supercomputer for our Hadoop setup. 
  • That is why we need such a feature in HDFS that can make copies of that file blocks for backup purposes, this is known as fault tolerance.
  • For the big brand organization, the data is very much important than the storage, so nobody cares about this extra storage.
  • You can configure the Replication factor in your hdfs-site.xml file.

 

No comments:

Post a Comment