Utility Computing aka Cloud Computing: Re: Reliability in of the cloud

Log shipping like Oracle's data guard tends to work well in covering for log failures. I've also seen success with storage-level replication like EMC's SRDF.

It can be problematic, however, for catastrophic failure (i.e. the whole data center), as the latency of synchronous replication over a WAN can be performance prohibitive beyond a few hundred miles or so (i.e. your DR site couldn't be across the continent). Of course, you could go asynchronous and tolerate some data loss as acceptable in such a scenario -- just that many aren't willing to (i.e. in CAP conjecture terms, emphasizing "C"onsistency & "A"vailability but becoming unavailable during a network "P"artition). Or you could partition / shard the data set itself to segment disruption, though that in itself has a whole pile of tradeoffs.

Solutions like Gemstone, Gigaspaces, Coherence, etc. if you're using "write behind" caching, all basically perform the same idea as log shipping and have similar tradeoffs as a traditional DB's log shipper, though they go about it in different ways. (e.g. GemStone basically *IS* a distributed database, whereas Gigaspaces & Coherence are distributed caches that delegate to an RDBMS or indexed file).

Cheers

Stu

On 6-Jun-08, at 8:04 AM, Alan Ho wrote:

There is good old oracle with hot-standby. Its not perfect, but our applications was able to fail-over gracefully from one oracle instance to another oracle instance.

Regards,
Alan Ho

----- Original Message ----
From: Gavan Corr <gcorr@nyx.com>
To: cloud-computing@googlegroups.com
Sent: Friday, June 6, 2008 5:57:43 AM
Subject: Re: Reliability in of the cloud

There are a number of commercial data caching solutions in the market, Gemstone, Coherence (now from Oracle) and Gigaspaces, and to a lesser extent terracotta. of those, Gemstone is the only one I have seen successfully deployed in a large scale multi site environment to ensure consistency of data between multiple sites, and to do reliable failover if a node or a center fails. Hadoop is gaining interest but not there yet...
Gavan

On Jun 5, 2008, at 9:00 PM, Khazret Sapenov wrote:

Alan,
If you are talking about Hadoop, then high availability is not inherent in it yet (but maybe it changed recently).
As far as I know, while there is Secondary Name Node provided (that resides in another data center) there's no guarantee of real time switch of Job Tracker/Name Node/Task Tracker/Data Nodes of DC A to Job Tracker/Name Node/Task Tracker/Data Nodes of DC B.

cheers

--
Khaz Sapenov,
Director of Research & Development
Enomaly Labs

US Phone: 212-461-4988 x5
Canada Phone: 416-848-6036 x5
E-mail: khaz@enomaly.net
Get Linked in> http://www.linkedin.com/in/sapenov
On Thu, Jun 5, 2008 at 8:41 PM, Alan Ho <karlunho@yahoo.ca> wrote:
I guessed that about google app engine too.

Things get really interesting when you need to do election leader decisions across data centers. E.g. If you are doing a big map-reduce task in one data center, it goes down, so you want to finish the task in another data center.

How does one transfer the task ? Is it even worth solving ?

Alan Ho

From: Reuven Cohen <ruv@enomaly.com>
Sent: June 05, 2008 10:03 AM
To: cloud-computing@googlegroups.com
Subject: Re: The Business of Building Clouds

From what I've seen of Google App Engine, they distribute your python code to dozens of servers and then use some kind of round robin to spread the load. Nothing ground breaking.

r/c

On Thu, Jun 5, 2008 at 12:59 PM, wyim wyim <wingmanyim@hotmail.com> wrote:

In regards to failover, does Google App Engine have some sort of a LoadBalancer API?

thanks
Wayne Yim

From: stuartcharlton@gmail.com

To: cloud-computing@googlegroups.com
Subject: Re: The Business of Building Clouds
Date: Thu, 5 Jun 2008 09:07:12 -0700

On 5-Jun-08, at 8:35 AM, Alan Ho wrote:

Picking a provider that has data center failover is critical - but it does mean that you write your application in a way that can failover gracefully. Cloud providers need to provide the base infrastructure to do so OR constrain the user to a particular programming paradigm (like the limitations of google app engine)

That's a very astute observation, Alan. Constraining an architecture to induce certain properties (guarantees?) is likely the right approach.

Though I wonder if AppEngine is a bit too "Nanny-ish" that limit its audience in ways that don't really impact the big picture qualities.

For example, the choice of Python was easy because it was a standard Google language, but that doesn't seem to be inherently a more applicable language than say C#, Java or Ruby.

I expect in the future that cloud computing systems will provide the concept of "cloud events" in case of major datacenter failures. I just don't see any way round it.

I wonder if Google actually provides this sort of failover for AppEngine today. Certainly, they could, though they provide no such guarantees at the moment.

As for "cloud events" - yup. In the traditional data centre, it's likely SNMP or JMX traps. On the cloud, it's not entirely clear if/where SNMP would play. Or WS-Man. Or something newer (?).

Cheers
Stu

--
--

Reuven Cohen
Founder & Chief Technologist, Enomaly Inc.
www.enomaly.com :: 416 848 6036 x 1
skype: ruv.net // aol: ruv6

blog > www.elasticvapor.com
-
Get Linked in> http://linkedin.com/pub/0/b72/7b4

Visit our website at http://www.nyse.com
*****************************************************************************
Note: The information contained in this message and any attachment to it is privileged, confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by replying to the message, and please delete it from your system. Thank you. NYSE Euronext, Inc.

Ask a question on any topic and get answers from real people. Go to Yahoo! Answers.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Cloud Computing" group.
To post to this group, send email to cloud-computing@googlegroups.com
To unsubscribe from this group, send email to cloud-computing-unsubscribe@googlegroups.com
For more options, visit this group at http://groups.google.ca/group/cloud-computing?hl=en
-~----------~----~----~----~------~----~------~--~---

Utility Computing aka Cloud Computing

Sunday, June 8, 2008

Re: Reliability in of the cloud

No comments:

Cloud Computing Conference 2009

About Me

Blog Archive

Labels

google analytics