Quality assurance
Filed under: technologies — andraz @ 17:08 24. Nov, 2007
How do you test a complex system that is trying to mimic being smart?
You want to automate testing, so you have your quality meter available for every little change you make. While having unit tests helps catching classical programming regressions, the major part of the challenge is having ’smart part’ under control. Unfortunately the only way to tell if the system is doing a good job or not is to have human check the results. The trick is that if you could automate testing in general, you would already have solved the hardest problem.So what you basically do is generate a set of evaluation data, manually. And have a system that does something like unit tests, but instead of giving you fail/pass results, you get statistics. Now you would think that the problem is solved, but that’s far away from truth.
There is changing of the dataset - when you have new content in the system, you get completely new related stories and you have to go back and have a human judge them. There is expansion of the evaluation data - as you add new tests you generally can’t send them through previous versions of your algorithms, since that would be prohibitely expansive. And there is statistics that hardly gives you overview over what exactly your changes caused, just few final numbers. And then there is the problem of pipelining the processing. Even if you improve the first stage, end results might be worse, since you’ve already adapted the second stage to previous first one. So you need to actually evaluate each part of the system in isolation and then together.
At the end you actually find out that you spend disproportional amount of time evaluating even the smallest changes. So you are in danger to just skip that evaluation which naturally you shouldn’t.
Ok, so much for today, now I think the evaluation run has just ended and I should be checking the results, again.
Related articles by Zemanta:
Optimization
Filed under: technologies — andraz @ 16:22 17. Nov, 2007
It is interesting to see how during development we are moving back and forth between ‘better job of suggesting’ to ‘faster suggesting’. A cycle seems to last around two weeks. Currently we are in optimization part of it. Just yesterday Tomaz had a breakthrough getting one component that took 10 seconds per request (the biggest time spender) to just under one second.
The interesting thing is that we were on the verge of falling into premature optimization trap. We’ve moved the most time-critical part of the code from Python to C, which at first didn’t seem to help much. And we’ve almost went for moving even more code to C, but in a moment of doubt I’ve fired up amazing oprofile profiler.
Profiling showed that we spend most of the time in an inner loop measuring the length of strings by wcslen() which is called from the Boost library that we use. Now we are back on track with our performance.
Related articles by Zemanta:
Pick any two
Filed under: technologies — andraz @ 2:29 3. Nov, 2007
Building web services is fun. You can have them fast, secure, reliable, scalable, maintainable and sexy. Pick any two.
As our first goal we had much less ambitious requirement: “working”. Few nights this week were pretty intense, but we have done it - moved the technology to the English language. At this point in time I’d like to really say that I am extremely lucky to be able to work with a team like this one. Everybody gave their best, and sacrificed a lot of sleep. Luckily we had Jaffa Cakes that kept us awake and running.
Before one of the presentations this week I have slept just over a hour and a half. But the joy of being able to demonstrate the product instead of just describing it (or showing the Slovenian version that noone would understand) made it worth the trouble.
Related articles by Zemanta:
Zemanta tech demo at MiniBar, London
Filed under: london, seedcamp, technologies, zemanta — jure @ 3:08 20. Oct, 2007
Today we held our our first public demo at the MiniBar meetup. Audiance was presented with sneak-peak preview of English version of our contextual engine. We defeated all the last-minute bugs and it worked flawlessly. A day before at the presentation in front of Tom Glocer, we learned that you should never put your demo on weird non-80 ports.
Now we are moving forward with contextual technologies (and removing buttons from the interface). If you want to be notified when we have more stuff to show to the public, sign up for our Newsletter.

Photo by Peter Čuhalev
Natural language processing fun
Filed under: technologies — andraz @ 22:51 23. Aug, 2007
Our new demo gives a glimpse of what we are currently working on. For those that are following us that might not seem as big functionality advance since Odprti kop. However we have rewritten the whole backend, making it scalable and extendable. And in turn making us much more flexible.
Part of backend consists of Natural Language Processing - in short NLP. The fun thing is that no matter how many things you improve, there is always wast amount of input data that causes wrong or at least not-exactly-right contextualization. However I think we made a lot of important advances.
By the way: NLP in slovenian stands for Neznani Leteči Predmeti, or translated to english Unknown Flying Objects - UFOs. Which sometimes seems as quite an accurate description.
