tag:blogger.com,1999:blog-8592569895405724581.post8788475394061075914..comments2011-11-18T23:58:22.854-08:00Comments on Algorithm Analysis: BellKor Algorithm: Global EffectsLarry Freemanhttp://www.blogger.com/profile/06906614246430481533noreply@blogger.comBlogger31125tag:blogger.com,1999:blog-8592569895405724581.post-74452544273876322792011-11-18T23:58:22.854-08:002011-11-18T23:58:22.854-08:00Thank you very much for your post. It is really he...Thank you very much for your post. It is really helpful. I have been working on this kNN algo for some days. This post helps me verify my understanding on the normalization.Trudyhttp://www.blogger.com/profile/08080717417304184238noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-16519928447696546622010-07-29T19:54:11.988-07:002010-07-29T19:54:11.988-07:00Hi Thanh,
This blog was a documentation of my eff...Hi Thanh,<br /><br />This blog was a documentation of my effort to understand the paper by BellKor. <br /><br />The formula I describe is the formula that worked for me. It is most likely not the optimal way to isolate an effect.<br /><br />If your formula works, I would suggest testing to see which formula results in the better RMSE.<br /><br />It's been a while since I worked on this blog entry so I can't really justify the formula that I posted versus the alternatives.<br /><br />If you would like to see how I came up with these formulas, I suggest that you go to the <a href="http://www.netflixprize.com//community/" rel="nofollow">Netflix Prize Forums</a> which has much of the discussions archived.<br /><br />Do a search for "Global Effects" to find the discussions.Larry Freemanhttp://www.blogger.com/profile/06906614246430481533noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-518910861135979302010-07-29T19:08:56.320-07:002010-07-29T19:08:56.320-07:00Hi,
I have another question in the Movie x UserAv...Hi, <br />I have another question in the Movie x UserAverage effect. In may understand, the formula must be xui = avg(user) - 3.6033, but your formula is xui = avg(user) - avg(movie)???. Could U please explain it?<br /><br />Thank you so much!<br />ThanhNH.Thanhhttp://www.blogger.com/profile/08723425075114542093noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-42810409249002117832010-07-23T09:05:37.715-07:002010-07-23T09:05:37.715-07:00Hi Thanh,
That's my understanding. As I unde...Hi Thanh,<br /><br />That's my understanding. As I understand it, the residual represents the "error" from the average.<br /><br />If there was no "error", then there would be no effect that needed to be identified.<br /><br />-LarryLarry Freemanhttp://www.blogger.com/profile/06906614246430481533noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-81650370761470912462010-07-22T21:12:17.472-07:002010-07-22T21:12:17.472-07:00Hi,
From your analysis, effect=theta(u) x x(ui)
Ho...Hi,<br />From your analysis, effect=theta(u) x x(ui)<br />However, I saw in Koren's paper which said that the model is r(ui) = theta(u) x x(ui) + error.<br />So in my understand, residual would be "error"???<br />There is something that I misunderstand. Please explain it.<br />Thank you so much!Thanhhttp://www.blogger.com/profile/08723425075114542093noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-2592429174968829622009-11-17T08:38:05.107-08:002009-11-17T08:38:05.107-08:00Hi ThinkerFeeler,
You are right that it doesn'...Hi ThinkerFeeler,<br /><br />You are right that it doesn't improve anything after you have picked your neighbors.<br /><br />But the if you remove the global effects before you choose the neighbors, it makes a big difference because if you remove the global effects before picking neighbors.<br /><br />Interestingly, it has less of an effect if you use it before SVD. In my own use, I got an improved result if I removed global effects after applying SVD.<br /><br />I encourage you to read the BellKor paper which is linked on the bottom of the blog entry [I updated the link that was previously broken].<br /><br />-LarryLarry Freemanhttp://www.blogger.com/profile/06906614246430481533noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-66112356433278573192009-11-17T08:27:29.242-08:002009-11-17T08:27:29.242-08:00Thanks for the explanation.
It perplexes me that...Thanks for the explanation. <br /><br />It perplexes me that removing global effects helps. Here's why. Suppose neighboring users really like an item. Well, then doesn't that make it more likely that our user u will like the item too? Isn't it the entire point of KNN to make predictions based on neighbors, under the assumption that similar users will have similar tastes?<br /><br />Similarly, if similar items are popular, then it's likely that our item i will be popular too. Removing the global effect would seem to defeat the entire purpose of nearest neighbors search.ThinkerFeelerhttp://www.blogger.com/profile/09555846438599465354noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-19212683530943090062009-11-17T00:25:02.674-08:002009-11-17T00:25:02.674-08:00Hi ThinkerFeeler,
I wrote that equation as part o...Hi ThinkerFeeler,<br /><br />I wrote that equation as part of an explanation of how errors are removed from estimates.<br /><br />I perhaps show my background by using syntax in that way. :-)<br /><br />While it may not be standard mathematics, it is standard in software.<br /><br />In Java, for example, it is written as x = x - avg or even more commonly x -= avg.<br /><br />Cheers,<br /><br />-LarryLarry Freemanhttp://www.blogger.com/profile/06906614246430481533noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-5200140950202492322009-11-17T00:07:44.795-08:002009-11-17T00:07:44.795-08:00You write,
xui = xui - avg(xui for a given u.
...You write, <br /><br /> xui = xui - avg(xui for a given u.<br /><br />Is that a typo? Is it an assignment statement? It's certainly not math.<br />Perhaps you mean <br /><br /> xui = rui - avg(rui for a given u).<br /><br />ThanksThinkerFeelerhttp://www.blogger.com/profile/09555846438599465354noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-82214943956648152612009-05-03T01:14:00.000-07:002009-05-03T01:14:00.000-07:00A remark: you said "In general, cross-validation i...A remark: you said "In general, cross-validation is done simply by changing values". What people might understand is that one simply tries different values for \alpha and checks the RMSE on the probe, until the best value is found. This is incorrect! It is very important not to tweak our learning algorithm on the data we use to test it -- otherwise we unintentionally "mix" our test data into the training data!<br />Determining parameters by cross validation is done by separating the ratings (without the probe) into, say, 10 equal pieces, and then use 9 pieces for training and 1 for testing -- and average the RMSE on all 10 folds.<br /><br />Thank you very much for what you do, it's extremely helpful!Olahttp://www.blogger.com/profile/07934206558589896619noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-45083507364143011032009-04-18T00:22:00.000-07:002009-04-18T00:22:00.000-07:00How do people avoid division by zero in all those ...How do people avoid division by zero in all those effects that calcuate datediff(date,first). Almost 10% of users did all their ratings on the same day. E.g. user 273808 rated 161 moves all on 15-Aug-2005.MikeMhttp://www.blogger.com/profile/05411310108810172734noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-71563996977105217442009-03-19T00:27:00.000-07:002009-03-19T00:27:00.000-07:00I found one bug that may be the reason why the res...I found one bug that may be the reason why the results are slightly off inbetween. You're using 550 in the following line:<BR/><BR/>update user_movie_average_effect set theta_u = (n_u*theta_u_hat)/(n_u + 550); <BR/><BR/>While in the cross validation values, 90 was determined. Although this leads to only 0.001 difference in that particular step, maybe the errors this generates have adverse effects on the following steps.Gerard Toonstrahttp://www.blogger.com/profile/17067969645449987498noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-60546611467917823202008-12-07T11:42:00.000-08:002008-12-07T11:42:00.000-08:00http://www.netflixprize.com/community/viewtopic.ph...http://www.netflixprize.com/community/viewtopic.php?pid=7726#p7726danielharanhttp://danielharan.com/noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-34205373296533741172008-07-25T12:57:00.000-07:002008-07-25T12:57:00.000-07:00Gustavo,Thanks very much for the feedback! :-)I a...Gustavo,<BR/><BR/>Thanks very much for the feedback! :-)<BR/><BR/>I am reviewing the sql code now so hopefully all the typos will soon be fixed.<BR/><BR/>-LarryLarry Freemanhttp://www.blogger.com/profile/06906614246430481533noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-8197971571054378102008-07-25T12:15:00.000-07:002008-07-25T12:15:00.000-07:00Hello Larry,I just finished reproducing all effect...Hello Larry,<BR/><BR/>I just finished reproducing all effects, up to effect 11. I got exactly the same results as you. I found only trivial typos in your code (such as missing parentheses or errors in table and field names as hassan mentioned) but no fundamental errors in the logic. Thanks again,<BR/><BR/>GustavoGustavo Faigenbaumhttp://www.blogger.com/profile/09284625297857053595noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-91846163052448478132008-07-25T08:25:00.000-07:002008-07-25T08:25:00.000-07:00Hi Gustavo,I am glad to hear that you are having m...Hi Gustavo,<BR/><BR/>I am glad to hear that you are having more success in reproducing. :-)<BR/><BR/>Please let me know if there are any points that I can better clarify.<BR/><BR/>Cheers,<BR/><BR/>-LarryLarry Freemanhttp://www.blogger.com/profile/06906614246430481533noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-55721675583774925012008-07-25T06:01:00.000-07:002008-07-25T06:01:00.000-07:00Thanks Larry. The problem I had faced on Movie x T...Thanks Larry. The problem I had faced on Movie x Time (movie)1/2 was my mistake. There was an error in my translation from your mysql to my ms sql. I have now corrected the error and managed to reproduce up to Global Effect 9. I am getting exactly the same RMSEs as you (rather than bellkor's). I have two more effects to go and I'll be done. Thanks for sharing... Gustavo.Gustavo Faigenbaumhttp://www.blogger.com/profile/09284625297857053595noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-85480292055240831152008-07-23T02:20:00.000-07:002008-07-23T02:20:00.000-07:00Hi Hassan,Thanks very much for confirming the issu...Hi Hassan,<BR/><BR/>Thanks very much for confirming the issue. I really appreciate that you both have pointed this out! :-)<BR/><BR/>Today, I started going through the sQL code: statement by statement.<BR/><BR/>I've cleaned up a bunch of code already. Still, it will take a 2-3 more days for me to go through it completely.<BR/><BR/>I will post here when I have verified all the SQL and confirmed the RMSE numbers.<BR/><BR/>-LarryLarry Freemanhttp://www.blogger.com/profile/06906614246430481533noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-46528251346847417752008-07-22T20:14:00.000-07:002008-07-22T20:14:00.000-07:00He is right. I experienced the same with your code...He is right. I experienced the same with your code.Hassanhttp://www.blogger.com/profile/17055360390575133907noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-65361378069570169182008-07-21T21:53:00.000-07:002008-07-21T21:53:00.000-07:00Hi Gustavo,If you would like to post your SQL as a...Hi Gustavo,<BR/><BR/>If you would like to post your SQL as a comment, I will take a look.<BR/><BR/>No worries about any consulting fees. This is a free blog. :-) Perhaps, others are having the same problems as you.<BR/><BR/>Cheers,<BR/><BR/>-LarryLarry Freemanhttp://www.blogger.com/profile/06906614246430481533noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-26442077782110560252008-07-21T21:33:00.000-07:002008-07-21T21:33:00.000-07:00Dear Larry,Thanks so much for sharing this effort....Dear Larry,<BR/><BR/>Thanks so much for sharing this effort. I have been trying to reproduce your code in MS SQL and it goes fine until Movie x Time (movie)1/2. After that my RMSE goes down and I cannot reproduce the Bellkor results. I would really like to understand what's wrong with my code; it looks like I'm doing exactly the same as you yet I'm not getting the right results. I was wondering if you would be interested providing some (paid) consulting services and help me review my code and find the bugs. I don't have much money but I hope we can strike a deal that helps me correct my code and learn something new, and makes you earn a few bucks. Please let me know if you would be interested. Thanks,<BR/>GFGustavo Faigenbaumhttp://www.blogger.com/profile/09284625297857053595noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-27598609899555835292008-07-11T05:17:00.000-07:002008-07-11T05:17:00.000-07:00Thanks a lot!Thanks a lot!Hassanhttp://www.blogger.com/profile/17055360390575133907noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-25477084252470155162008-07-11T00:16:00.000-07:002008-07-11T00:16:00.000-07:00Hi Hassan,Very sorry to hear about your experience...Hi Hassan,<BR/><BR/>Very sorry to hear about your experience with the code. <BR/><BR/>Thanks for letting me know.<BR/><BR/>When I have time, I'll release my tested code.<BR/><BR/>I will post a comment when the tested code is ready.<BR/><BR/>-LarryLarry Freemanhttp://www.blogger.com/profile/06906614246430481533noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-92081287333141232812008-07-10T17:54:00.000-07:002008-07-10T17:54:00.000-07:00Hi Larry,I tried the code on this page. There are ...Hi Larry,<BR/><BR/>I tried the code on this page. There are way too many typos in it for it to be useful. I fixed many of them but at the end, I managed to delete all probe predictions and almost gave up. It will be great if you could post a SQL file containing the actual statements executed. Appreciate your help!Hassanhttp://www.blogger.com/profile/17055360390575133907noreply@blogger.comtag:blogger.com,1999:blog-8592569895405724581.post-23836179421538152132008-07-09T15:50:00.000-07:002008-07-09T15:50:00.000-07:00I can think of one reason why your results are not...I can think of one reason why your results are not exactly the same as BellKore's.<BR/><BR/>You are adjusting the probe predictions to stay betwen 1 and 5 after applying every effect. You may want to do it only once instead (i.e., after applying all effects). It is certainly possibly that one effect can take a prediction below one and the next one can adjust it back. However, if you have changed it in between to 1, the adjusted prediction will not be the same.Hassanhttp://www.blogger.com/profile/17055360390575133907noreply@blogger.com