Cloud Computing, the Semantic Web and the Future of Database Technology – Specialised databases series – Part 5

In the final part of the specialised database series we look at how Cloud computing and the Sematic Web will effect the future of database technologies. ( (Part 1Part 2, Part 3, Part 4) There will be reference to appendix, these can be found in the full text here.

“Internet based technologies are likely to have a profound impact on how we view database systems in the future; especially as communications media is deployed supporting ever-increasing bandwidths. Two example technologies are the semantic web and cloud computing, both of which are already showing commercial promise.”

Considering the above statement I conducted research in two parts and summarised my argument below.

Cloud Computing

Cloud computing is essentially providing the capability to use computing and storage as a remote service.  Nicholas Carr (Carr, 2008) comments that computing will be much like electricity: purchase when needed, referring to it as “utility computing”. That is that you pay a company, such as Amazon, to provide you with the computing and storage you require and you keep paying them for as long as you need it. If you require more or less services then adjust your payments accordingly. Typically the provider will produce a Service Level Agreement (SLA) which they will adhere to in providing you that service.  For example you may want to back up all of your home videos, photos and mp3’s in case of hardware failure or just due to the amount of space these files are taking up. Companies like humyo.com offer a cloud based service that will store all this data for you and at a minimal cost.

After researching the architecture of cloud computing (Appendix 1 – Cloud Computing Architecture ) we know what cloud computing is. How can this affect the future of database technology?  The Claremont report on Database research 2008 (Appendix 3 – Claremont Report) stated that there is a “turning point in the history of the field [databases]” and that this is due to the explosion of data and usage scenarios as well as huge shifts in computing hardware and platforms. A part of this turning point is listed as being cloud computing. Companies are continually making efforts to reduce computing costs (Han, 2010) and in today’s economic climate more so than ever. This is where cloud computing provides a solution that has seen large companies like Amazon and Yelp move to the cloud. Moving storage facilities offsite and into the cloud would save large amounts of money in maintenance and support, hardware, licencing and storage space, this is attractive to any company. Aside from the cost savings, services are available on request without waiting which means should you require extra storage it’s available instantly. If the services being used are on a grid computing system then companies would be able to take advantage of the entire networks processing power which aligns with the Claremont report statement of “Cloud computing democratises access to parallel clusters of computers”.

An example of a company moving to the cloud is 37 Signals. According to amazon web services case study (Amazon, n.d.) 37 signals were original using Network File System (NFS) server for storage. When their data began to grow towards 1 terabyte they needed an alternative solution.  David Heinemeier Hansson, 37signals’ Partner, is reported to have said

“The cost and time associated with maintaining a 1 terabyte file server with full backups and zero downtime are significant when you’re living off managed hosting.”

37 Signals moved their products Basecamp, Highrise, Campfire, Backpack, Ta-da List, and Writeboard to the cloud stating

“I thought S3 was brilliant. I like low risk, outsourced services that give us room to grow with very little initial outlay”.

With all the positive vibe around cloud computing are there any drawbacks? The biggest concern that companies are having with cloud computing is security and privacy. This, for some corporations such as the government, is of up-most importance.  Security concerns companies because they are not able to visibly see how secure their data is, however it has been suggested that the providers of cloud computing have to have reliable security measures in place otherwise they would lose all of their clients. Privacy is a different concern with users being able to access data from anywhere privacy is compromised. One way to combat this has been to enforce authorisation and authorisation groups, where users can only access certain resources.   Industry professionals, Law firms and universities have also been debating more philosophical questions such as. Who owns the data, the client or the storage provider? Can clients be prevented from accessing their own data?

Although the Claremont report expresses that it anticipates future data centric applications will leverage data services in the cloud they do highlight that more work is needed to understand the scaling realities of cloud computing.  However in a more recent report by Salesforce (Appendix 4 – Salesforce Report) they are quoted as saying:

“[cloud databases] they transparently scale according to varying application workloads”

In summary the cloud brings such large cost and time benefits that a number of companies will view its advantages to far outweigh the disadvantages. However in the database space DbaaS is a relatively new concept and issue with security, privacy and data ownership are likely to deter the highly security conscious companies. In the future we should expect to see proof of improvements in security and privacy as it is in the service providers’ best interests. In light of increased security and privacy we are likely to see larger security conscious companies’ move to using the cloud as a storage facility.

The Semantic Web

To understand the semantic web we must first understand that the web is a space of information currently used for human-human communication. The content on the web is currently unstructured, by that we mean that the structure is not evident to a robot browsing the web (Berners-Lee, 1998).  The semantic web is about structuring content on the web in such a language that expresses information in a machine interpretive way this means we can expose the information hidden in text or blobs of media (Wolf, 2010).

The semantic web is a layered architecture (Hall & O’Hara, 2009) to be able to understand this architecture and how its implementation a number of terms must first be explained first, Appendix 2 – Semantic Web Architecture and Terminology.

In 2004 a book by Gutierrez et al.  stated that:

“Research on formal aspects of RDF data and query languages is made necessary by the new features that arise in querying RDF graphs as opposed to standard databases”

Which clearly highlights that the semantic web will not be using database technology as we know it today, the traditional database does not translate directly to the RDF setting (Gutierrez et al., 2004). What we see in this chapter is references to an RDF database that can be considered a standard relational database with a more suitable querying system.

“An RDF graph can be considered a standard relational database: a relation of triples with the attributes Subject, Predicate, and Object. In what follows, an RDF database will be simply an RDF graph.”

Tim Berners-Lee appears in agreement with that statement, saying:

“The semantic web data model is very directly connected with the model of relational databases” (Berners-Lee, 1998)

In 2007 Feigenbaum et al. wrote an article detailing applications of the semantic web in today’s world. One case study is based around drug discovery and how semantic capabilities can be leveraged to find the underlying genetic causes of cardiovascular diseases. It describes how data was taken from databases in different departments of the hospital and translated into RDF format and stored in a “Semantic Web database”. Another example given is a case study in health care. A system called SAPPHIRE integrates a wide range of data from local health care providers, hospitals, environmental protection agencies and scientific literature. SAPPHIRE was configured to help with combating the spread of disease when public health officials became concerned after Hurricane Katrina.

“SAPPHIRE succeeded in identifying gastrointestinal, respiratory and conjunctivitis outbreaks in survivors of the disaster much sooner than would have been possible before.” (Feigenbaum et al., 2007)

With any emerging technology there are always issues. In an article about the semantic web (Rapoza, 2007) Rapoza notes that there will be vulnerabilities to scammers, which could lead to security issues and therefore data protection problems, Rapoza gives the example of phishing sites and how, just like those, it could be possible to legitimate a source.  There is also a potential issue with access, especially when reviewing the commercial aspects, Burners-Less had said that this is an area for focus for the Semantic Web community. One of the key adoption blocks that is named in the article by Rapoza is greed. Businesses are reluctant to expose their data and actively develop custom formats to keep people using their products and sites.

In summary the future of the semantic web is unclear; there are a number of hurdles that have to be overcome before we see this go main stream. It’s clear there is a need for a more structured “Web of Data”, especially with information on the web growing at an alarming rate. Research suggests that we will see more not for profit companies make a move towards utilising the sematic web before we see commercial businesses move and if they do the top tiers of the architecture will have to be clarified and standardised.

Summary

Research would indicate that the aforementioned technologies will have a large impact on the way we view database systems in the future. Database as a Service (DbaaS) is already a very real option for many companies, the cloud is already making its impact and we are likely to see this impact rapidly grow. The semantic web has proved to be of great use although adoption is not currently wide there is a definite need. Research already describes the impact RDF has on databases, as we move forward it is possible we will see a more specialised database or a highly functional hybrid.

At the time if this report and through analysis of the research it could be suggested that cloud technologies are going to leave bigger impact on database technologies than the semantic web over the next 5-10 years.

That’s the final part, I hope you’ve enjoyed the series!

Ping me with any questions.

Thanks,

Sara :)

<shameful plug>If you enjoy the post please retweet ;) </shameful plug>

References

Filed under: Debate, Research, Technical, , , , ,

Leave a Reply