You'll eventually encounter this on larger distributed systems
xZWd_4ZDqbI — Published on YouTube channel Web Dev Cody on October 6, 2024, 1:00 PM
Watch VideoSummary
This summary is generated by AI and may contain inaccuracies.
- Speaker A introduces the term called eventual consistency to his audience and explains why it is important to learn it. Then he talks about the latency of a database and how to deal with it.
Video Description
Become a YT Members to get extra perks!
https://www.youtube.com/@webdevcody/join
My Products
๐ฅ Video Crafter: https://thevideocrafter.com/?utm=wdc
๐ Scary Story Generator: https://scarystorygenerator.com/?utm=wdc
๐๏ธ WDC StarterKit: https://wdcstarterkit.com
๐ ProjectPlannerAI: https://projectplannerai.com
๐ค IconGeneratorAI: https://icongeneratorai.com
Useful Links
๐ฌ Discord: https://discord.gg/4kGbBaa
๐ Newsletter: https://newsletter.webdevcody.com/
๐ GitHub: https://github.com/webdevcody
๐บ Twitch: https://www.twitch.tv/webdevcody
๐ค Website: https://webdevcody.com
๐ฆ Twitter: https://twitter.com/webdevcody
Transcription
This video transcription is generated by AI and may contain inaccuracies.
So if you are in the industry long enough, you will eventually come across this term called eventual consistency. I want to kind of walk you through what this is and why it's important to learn. So eventual consistency, what the heck is this and why is it important to learn? So when you're building out a system, you'll probably have a database. So for example, we could have our postgres database over here. And if you're doing a really simple thing, you might just have this live in just the east coast. So you have your postgres database that lives in the east coast, but at some point you may have a need or a business requirement to have this data also live on the west coast. So that if you have a user, let's just go ahead and make a user over here called Bob. If Bob wants to fetch data, I mean, he has to go across the entire United States from the west coast. Let's say he's living in California or something, and he has to go fetch the data from the east coast. And granted, this is pretty fast, right? This doesn't take too long, but if you were to scale to even more places across the globe, you can't have people living in India or Africa or Europe trying to fetch data from the east coast because it's just not going to be as fast as if they had the data living in their country from the get go. And by the way, I asked chat GPT, what is the average latency? And it says 60 to 100 milliseconds to hit us east one Virginia to US west California. By the way, if you're hosting a server, more than likely you're going to be hosting an east one. That's where a majority of all servers are hosted in the United States. So I guess the latency is not as bad. 100 milliseconds, who cares? Right? Well, let's just go ahead and add in what's the latency from us east one to Europe? And this is saying, again, it's like 70 to 120. What about from us east one to Asia? So 140 to 170 milliseconds. And it really, it really depends, right? It looks like this one is 200 to 250 milliseconds now, depending on where your users are and like, how fast their Internet is and like, what providers they have, I mean, this can range, right? And so if you want to try to build the best user experience, you want Bob to be able to fetch the data as fast as possible. Now, one thing that's worth pointing out is that you will have an API between your database and Bob. Okay, so Bob's not accessing the data directly. He has to hit a database and that it just adds a little bit more latency. So doing this can make Bob a little bit more happier because his data can come over the wire faster, but it comes at a cost. Now you have an eventual consistent system, which is a lot harder to deal with, because what the issue is is that let's say we had another user, say Sally, and she wants to write some data. Let's just go ahead and put an API, and she's going to go ahead and just write some data. So write some data, and we'll just go ahead and write that there. And what happens is that the data that she's writing, it's going to take some time to get to the API, and then it gets to the database, and then behind the scenes, the database should replicate that data. Depends on what you're using. Sometimes, like AWS Aurora makes this super seamless, but there's still delay, right? This takes time for your data to go from the primary all the way to your replica. And then finally, if Bob were to try to fetch that data somewhere in between where this data was moving over, he could get a stale copy. He can get version one of the data, but let's say Sally just wrote version two. This takes time to go over. Now, depending on the application you're building, this may not be that big of an issue. Maybe Bob and Sally never actually modified the same records. But if you have a system where you have multiple people trying to read and write to the same type of collections, you can run into very, very hard issues to debug, where one out of, let's say, 100 or 1000 attempts, you get this weird scenario where someone's trying to basically read some data and he gets back some stale copy, and maybe he decides to do something with that stale data and then like rewrite it to the database. And that causes issues because he got an older copy of the data. Now, obviously there's a bunch of stuff like transactions and asset compliance that you'll have to bring into your system to like, make sure that if users are modifying the same data, like, you don't want race conditions, right? But it is something to keep in mind because I've seen something in a project that I've worked on where we actually use elasticsearch to basically have users be able to look up data. Okay, so we'll have like open search is actually what we're using. And once you write to the primary database behind the scenes, there's a system that replicates some data to elasticsearch, so then it's indexable and you can query over that using a inverted index search, right? But the issue is that you'll write this data and this could take up to a second or two to come back, which isn't usually a problem for the users. But what it is a problem for is all the integration tests that we have written, where we write some data in the test and then we assert that we can do a query and we get back the right data, but the data is not there because it takes some time to go over. And so how do you kind of address that in your integration test? End to end test? Typically you have to add some type of polling where you just try the assert statement a couple times, you try to reload the page and then assert the data is there. If it doesn't get there, you maybe try it two more times and eventually that data will show up. And if it doesn't, then maybe there's a bug with somewhere over here, right? So I guess the point of this video is just keep that in mind. If you're building out a system and you're using an eventual consistent database, I do believe DynamoDB is an eventually consistent database by default. They do have like transactions built in that you can use, but you can write some data and then read that data and the data comes back, it's going to be stale. I also think MongoDB is eventually consistent. Mongo uses a form of eventual consistency called eventual consistent with immediate consistency for most reads. This means that while MongoDB does not guarantee immediate consistency for all reads, it does guarantee that after a write operation, the data will eventually be consistent across all replicas. So I mean, there's no guarantee that if you do a read right after you do a write, you're going to get that data back as you expect it. And so this is very important and you have to keep them in mind when you're using a database that uses like sharding or replicas and read write replicas and stuff like that. I guess the point of this video is just keep that in mind. Depending on the type of database you're using. And if you have like this need to do multi region setups, sometimes the data you fetch back could actually be stale. And there's mechanisms you can add in place to make sure that you verify before you do something that the data you're about to operate on is the latest, greatest data. If that's something that's important, to your application. All right, I hope you guys enjoy this video. Be sure to, like, comment, subscribe, and, like always, have a good day. Happy quitting.