Problems with Memcachedb – Web Development

Let’s take it from which of these
pieces don’t exist anymore. All right. After you left but before I started,
memcachedb hit a scaling wall, and it just would not go any further,
the writes just were too fast for it. Did you guys try adding more memcachedb boxes? I don’t know actually.>>You went
there for that.>>Yeah. Memcachedb is not designed for the
heavy load we were throwing at it. It’s basically–it was the memcache code bolted
on top of Berkeley db, which is kind of neat, but it wasn’t really kind of design to
work at kind of our scale or use case, which is just basically sending as
many queries at it as you could. So you guys got rid of this?>>Right. And replace it with Cassandra.
It’s a distributed, NoSQL database. The way it works is you have rows, and a row is sharded by it’s key to
somewhere in the ring of servers so you get automatic sharding
across this entire ring when a row has columns inside of it and
that’s where the actual data is stored. This moves to Cassandra and was pretty similar. I’ll let you draw Cassandra there and
I’ll explain a little bit more about– Remember we talked about with database as you can replicate them where you send the same data to multiple machines, which helps
for load and durability–this kind of picture here or you can shard them which is you send
some chunk of data to one machine and another chunk of data to another. Let’s say we are talking about–in this case, we started using this for the precompute stuff first?>>Yeah. You’ve got these listings–every subreddit has
like a hot page and that will be precomputed so you can store hot pages for
some Reddits on this node and some hot pages for Reddits
on this node and all around. So, each node is not an exact copy of every other node.>>Right.>>Now, is there some overlap? Right. It’s configurable, but in our case,
we’re using a replication factor of 3, which means that if a piece of data lives on
this node, it is also on this one and this one and that happens all around the ring. And why do you do that? Simultaneously a read can be serviced
from any one of the 3 if we allow it to be, which means that if one node is going
slow, we don’t go really slow, and also it means that if anyone
one node just cease to exist, we wouldn’t lose all data for
that segment of the data. Let’s say you have content on this guy, this guy,
and this guy, and you lose this node, does the content get redistributed?>>No. They’re assigned a key space and you have to move
tokens if you want to rebalance the ring. Okay. That’s something that you
as a developer have to do? Yeah, that’s an operational thing
>>Or systems?>>Yeah. One of the things that happens as the team
grows too, as people’s role has changed. When you’re a small website, you are the designer, developer, sysadmin, all in one, and you guys have a whole team of
operations guys.>>Two. Yeah.>>Okay. You have a team of two–they’re very good, from what I’ve heard. They’re extraordinarily good. When I left, it was me and David
who now works with me at Hipmunk basically developers and OPs guys,
and we’re better developers than we were OPs guys, so that explains some of this.

Be the first to comment

Leave a Reply

Your email address will not be published.