Distributed computing
I am a data engineer at Tesco and this blog is part of a mentoring process to track the progress of my career development journey.
a) What is a Distributed System? |
What is a Distributed System?
Distributed System is system for paralel computing in data processing and distribution of work to several different computers. One computer is named as a driver and distribute work to other nodes (workers, executors). When one of the nodes is not working, the work can be rerun on the other node. Computers are on the same level.
How is a Distributed System different from a Centralized System?
Centralized system is not possible to replace one node by other. Every part has specific function.
Characteristics of the Distributed Systems:
Resource sharing: nodes are runing on the cluster of computers and are shared together
Parallel processing: processing of the data is done in parallel and nodes are not waiting for others
CAP Theorem
CAP Theorem is talking about three areas of the distributed data warehouse (Eric Brewer in 1999): Consistency, Availability and PArtition Tolerance and only two of them can be serve at once. ref_CAP1
The CAP theorem states that a distributed system can only provide two of the three characteristics of Consistency, Availability, and Partition Tolerance. ref_CAP2
Consistency means there are same data on every node. Availability means you will always get the data. Partition tolerance is a lost or delayed connection between nodes.
It is similar to project management, where we have money, time and quality, and these properties cannot be optimized separately.
Distributed Database - ACID vs. BASE databases
ACID databases are transactional databases with the properties: Atomicity, Consistency, Isolation and Durability. ref_ACID
ACID databases prioritize consistency over availability. In contrast, BASE databases prioritize availability over consistency. Instead of failing the transaction, users can access inconsistent data temporarily. Data consistency is achieved, but not immediately. ref_BASE
Transaction is smallest work to be performed by the dataabse. For example it can be compared to wire transfer. Money should leave your account and transaction is finished after transfering to another.