Z-Blog

New web cambrian explosion

Posted by andraz, under zemanta on August 16th, 2008

Datasets in the Linking Open Data project, as ...Image via Wikipedia

Nodalities is one of the bright stars of semantic web reporting. It’s a magazine more concerned with business than academic aspects of semantic web. In last issue I had a pleasure to write an article about my view of how semantic data is changing the web game. I am republishing it here, but don’t forget to read the rest of Nodalities!

———————————————————————————————


New web cambrian explosion

Everyone is getting on the semantic web bandwagon, creating, curating and publishing both linked web data and annotated-web data. Nearly every day Mash-ups using semantic web and web services from big platform providers are born. By opening up the data the new platform has been quietly established.

Definitions of semantic web are as elusive as definitions of Web 2.0, it means different things to different people. However to me it seems that the most obvious and direct consequence has been mostly neglected in shadow of bigger visions. Semantic web turns services that were previously prohibitively expansive to create into cheap and possible. Be it big service or small one.

Data acquisition that services can build upon is being made cheaper and cheaper because data is now (getting) well organized, exchangeable, accessible and in some cases even free. When looking at things on the web scale there are two big data sources that are now at the disposal of every developer, entrepreneur, artist and investor.

First one is Wikipedia and its semantic derivates such as dbpedia and Freebase. Second one is social network data opening up gradually, but surely. One describes topics of the world and our lives at large, the second one manages to describe our lives at micro level. The third benefit is that the rest of the data sources will map their information to one of those two. Zigtag and Faviki are first examples of services trying to map Wikipedia view of the world to the web content at large (and I strongly suggest trying them out).

Is this True Semantic Web?

For the true crusaders of Semantic web this is “little semantics gets you a long way” approach. But it can also be seen as bottom up approach which will end up using more detailed ontologies for specific purposes while still being connected to the web at large and leveraging its potential. Lately even Cyc maps some of its knowledge to Wikipedia.

Not all of semantic web data will be free, but it seems that easy exchange, augmenting and repurposing of data will lower the barrier for competition and thus drive prices into the ground. As creating and maintaining complex datasets becomes easier and almost implicit, more people and companies will be doing it. Beside encyclopedia market there is another interesting case. Microstock photo providers managed to undercut prices of large photo providers with the help of both web architecture and its social fabric which made it cheap to aggregate supply from cheap sources.

One thing that I hope is yet to happen is social networks competing themselves into opening the data for repurposing to third party applications (with appropriate mechanisms for privacy protection in place).

While still being closed garden Facebook is a very interesting case. You might not have noticed it, but Facebook’s platform is just one step away from being a vision of automated agents come true. The data is in there, development platform is standardized, developers with millions of ideas are there. The question is just how far Facebook will let applications act on their own.

Beyond Web 2.0

The second web Cambrian explosion (Web 2.0) came into existence largely because of the abundance of cheap infrastructure of both hardware and software. Suddenly startups didn’t need to buy large servers up-front, they could rent them and scale when needed. And as importantly software infrastructure became cheap to get and build upon. Open source has created huge, diverse and extremely cheap software stack that many startups are leveraging. Not only that, it is standardized enough that developers don’t need long lead-in times to start crunching useful services out. The trend is going even more in the direction of “infrastructure as service”, we are seeing companies like Amazon and Google not only offering their CPU cycles and storage, but whole database engines, queuing services and similar.

If Web 2.0 was enabled by cheap hardware and software infrastructure, this time web will be enabled by cheap data. Cheap not only in sense of not having to pay for it, but cheap when acquiring, using, storing and reasoning about it.

Killer app

While it seems that universal access to data makes many things possible and cheap, it also makes it harder as far as business models are concerned, especially with the “free” mentality of web lately. But I will not go into discussion of the business models of web startups, since so much is being written about that already.

Except artists everyone else is asking the though question what is going to be the “killer app” for semantic web?

Unsurprisingly answering “Semantic web is the killer app” does not satisfy developers and entrepreneurs let alone investors. It might satisfy visionaries, but visions rarely put bread on the table.

For investors it is sometimes hard to accept the fact that killer application is the distributed platform itself and that there seems to be no gatekeeper emerging that would cash-in on the “semantic web” as a whole. Bets are placed all over the space, from search engines that ‘understand’ your questions, to social networks using semantics, but the most of success currently seems to be in vertical search engines.

However there is a problem a lot of these companies have stumbled into. Semantic web is sold on the promise of computer understanding human world and affairs on one level or another. And when you tell that to common users, it is unbelievably easy to completely disappoint them. Just look at the media coverage of some recently launched startups in that field.

Dreaming of apps

So are there any applications that might come into existence anyway and not disappoint? What if we manage to establish semantic web technologies as ambiental web fabric, which works best when you don’t have to pay much attention, but really helps in many, but not necessarily in all cases?

Well, I can think of some and I am sure you can too, my question is why we are not seeing more work in that area?

Lets start with social networking, social network such as Linkedin knows where I am, who I work for, whom I worked for. It knows my whole schooling, my career up until present. Adding a bit more information it could easily know hobbies, which conferences I go to, my connections by type. This could be pulled from other networks such as Facebook. Why doesn’t Linkedin let me state the direction I want for my career?

And then use that information to automatically discover who I should get to know to achieve short and long term career goals? And while we are at it why wouldn’t it look at my and his/her calendar and schedule time and place of a meeting. Oh, and since I am meeting someone I don’t know it can offer a list of topics to break the ice with – matching hobbies, friends and maybe interesting facts about where we both worked, but did not know about each other. I’d just love to answer a simple yes/no questionnaire every week and select between few persons and have a meeting setup automagically. Long term benefits of this far outweight some mismatches that would happen. And not just meeting the people already in position to help, but also people that are expected to go into those positions in the future. Is anyone doing research on predicting career moves based on social network dynamics?

The general idea here is that social networks have semantic data that can be leveraged by machine learners to do great recommendations of how to develop one’s career and life. You can call it “automatic career steering” or “managed real-life social networking”.

If you want to get a bit wilder, think about your mobile phone. It is following you around, knowing where you are who you communicate with (discovery of proximity via Bluetooth), it could even record all the audio on the phone and off the phone. Indexing of the audio is hard, but if you have good context – both geographical, social network context and cultural knowledge the tasks becomes a tiny bit more tractable. Maybe it could work in some specific situations, maybe it could listen to your phone conversation and put the meeting you just verbally scheduled into your calendar, maybe it could provide an instant recall for the book you know somebody mentioned a week ago or maybe it could even create automatic meeting minutes for you. But all this are hard things to get right every time for the computer. That’s why every morning computer would ask you a few questions about conclusions that came out from processing all your yesterday-data while you slept.

Getting real (with Zemanta)

Previous paragraph is quite a stretch of imagination, but some things are already doable today.

Imagine you are an author writing an article, bog post or a report. Right now your computer is a tool to let you input the text and put a bit of design in it. But why couldn’t computer at least try to figure out what you are talking about? And just give you some addition material on the side, unobtrusively, but possibly useful. It can make mistakes, but it has to bring real benefit enough many times to be seen as useful. What could a computer suggest?

Well, it could establish relations between your writing and other semantic sources. We already mentioned two – Wikipedia as world at large and your social network as your microworld. There is high possibility that parts of your writing are going to map to those two sources. Maybe you want to know something more about the term you just mentioned or maybe you want to offer your readers a chance to read about it themselves and place a link there (if you are authoring web content). Natural language processing is capable of doing those things today.

Maybe you would like to be pointed out who else wrote what on that topic? Someday I’d like for a computer to warn me that what I have just written actually agrees with John C. Dvorak for once.

Computer can automatically discover (free or commercial) images that would illustrate what you are saying in your text. And you can pick and include them with one click.

And in case of web published tags can be automatically suggested to make discovery of their content easier. Gradually even more types of suggestions can be implemented as the “understanding” of text gets better.

These all are the things that we do at Zemanta. We are trying to hide all the complexities of the process of “understanding” the text and matching it with the semantical sources. The user never has to know, (s)he just wants suggestions that make writing process more efficient and end product better. And when user selects specific suggestions maybe we can even sneak in (with his permission) semantic annotation that comes handy later on when semantically-capable search crawler such as SearchMonkey comes along.

Currently this technology makes most sense for bloggers, since they can use all those type of suggestions to make their blogging easier and in some cases more profitable. However maybe other people want to do different stuff with it, that’s why API was born, it is currently in testing and open to anyone who sends a mail. Maybe others have better ideas than us. Why wouldn’t this technology come handy in any CMS and in any application where you have to author text? Or in your online word processor, or maybe in an email program? When I type “Hy mum, I am on holidays on Tenerife” in an email I want the computer to suggest me a photo of Tenerife where I am at. Mum would love it!


———————————–END OF ARTICLE———————————————

Now the funny part, I was undersigned as “Andraz Tori is the CTO and Co-Founder of Zemanta, an applied-semantics startup aimed at blogging“. Anyone knows of any non-applied semantics startups? Maybe Nodalities still needs to be careful about academic roots of semantic web.

Reblog this post [with Zemanta]
Bookmark and Share
blog comments powered by Disqus