Since I moved to San Francisco, I am writing reports on knowledge gathered at different events for internal Zemanta audience. But then I realized, why not just post it to our tech blog?
So here’s the report from today’s ACM Data Mining Camp Silicon Valley. This is not exactly live blogging, but it is neither the deep thinking, so do further research and form opinions by yourself.
Dr PJ Patil Chief scientist from Linkedin:
- lots of LinkedIn runs on data mining, all from recommendations of connections to ranking of groups, etc. It’s really deep and strategical
- you don’t need to have a Phd to work at Linkedin datamining team, they need many different types of talents and skills (Google’s Dr. Rajan Patel: “I agree absolutely”)
- LinkedIn is open sourcing lots of their tech in project Voldermort, but more is coming, including the reporting layer (if I understood correctly)
- “data mining is moving from backend technology to frontend, becoming the product by itself”
- LinkedIn is hiring

- Ken Krugler via CrunchBase
I learned about Cascading and Bixo. Cascading is a data processing workflow management for Hadoop and alike, while Bixo is a data mining toolkit working with Cascading and Hadoop. Talking with Ken Krugler from Bixo Labs he mentioned that we (at Zemanta) are not the only ones missing a good meta-management solution for dealing with data workflows, triggers, metadata about processing, etc. Everybody is missing it. He mentioned some project that might see the light of the day in the future (Krugler is actually the guy that created Krugle search engine for code, for those that remember it).
Greg Makowski (also one of #dmcamp organizers) did a presentation on Netflix challenge algorithms. The interesting tidbit is that Netflix is planning another competition, but this time it will be time limited (18 months) and not performance based (10% as the last one). It’s going to be fun to watch.

- Hadoop logo via CrunchBase
Apache Hadoop seems to be talk of the day here. However some claim Apache Mahout is not mature enough. I like the Ted Dunning (man behind Mahout) is advocating: just do it. He says “tell us what people need done and we’ll help them”.”And if you have data to share that’s the fastest way to get people excited about the problem you have”
Semantic web session wasn’t very popular when proposed, but still got some people showing up. Not surprising. More surprisingly at the session about open sourced datasets (that could be used for machine learning) Freebase was something new to people.
All in all ACM Silicon Valley Data Mining camp was pretty good event, albeit some presentations could use some work and it was pretty noisy in some rooms.
Internal memo: While writing all this I have been fighting with Zemanta’s insertion of the Ken Krugler image and destroyed html layout. We have some fixing to do.

![Reblog this post [with Zemanta]](http://img.zemanta.com/reblog_e.png?x-id=01622722-f173-4977-aa7f-c03649cc1073)