[Python-il] [pyweb-il:325] Re: Need help with django-website

Refael refack at gmail.com
Wed Jul 29 15:15:53 IDT 2009


I think I have a jackpot

Using Yahoo Term extractor on a random 20 articles I get:
[(u'django', 14),
 (u'google', 4),
 (u'python', 3),
 (u'oxford', 2),
 (u'ruby on rails', 2),
 (u'migrations', 2),
 (u'apps', 2),
 (u'iteration', 2),
 (u'snippets', 2),
 (u'models', 2),
 (u'running', 2),
 (u'google maps', 2),
 (u'unit tests', 1),
 (u'geek night', 1),
 (u'ali', 1),
 (u'dba', 1),
 (u'celebrities', 1),
 (u'data models', 1),
 (u'vagas para', 1),
 (u'admin interface', 1),
 (u'nbsp', 1),
 (u'password xxx', 1),
 (u'internet explorer', 1),
 (u'volta', 1),
 (u's\xe3o paulo', 1),
 (u'long time', 1),
 (u'larson', 1),
 (u'staging', 1),
 (u'capabilities', 1),
 (u'blog', 1),
 (u'pra valer', 1),
 (u'dict', 1),
 (u'search software', 1),
 (u'advice', 1),
 (u'interactive map', 1),
 (u'crash', 1),
 (u'banco de dados', 1),
 (u'keyword arguments', 1),
 (u'export library', 1),
 (u'core management', 1),
 (u'fantasy sport', 1),
 (u'submission', 1),
 (u'foi', 1),
 (u'html javascript', 1),
 (u'last time', 1),
 (u'cms', 1),
 (u'database name', 1),
 (u'enthusiasts', 1),
 (u'map', 1),
 (u'cairo', 1),
 (u'creation', 1),
 (u'sync', 1),
 (u'meta', 1),
 (u'sem', 1),
 (u'inkscape', 1),
 (u'pylons', 1),
 (u'pdf export', 1),
 (u'abc', 1),
 (u'install software', 1),
 (u'exit 1', 1),
 (u'uma', 1),
 (u'irc', 1),
 (u'dias', 1),
 (u'exercise', 1),
 (u'best project', 1),
 (u'time one', 1),
 (u'reason', 1),
 (u'interface', 1),
 (u'webapp', 1),
 (u'bottom line', 1),
 (u'database engine', 1),
 (u'friends houses', 1),
 (u'looking at the environment', 1),
 (u'launch', 1),
 (u'content types', 1),
 (u'ajax', 1),
 (u'discussion groups', 1),
 (u'new game', 1),
 (u'new features', 1),
 (u'aptitude', 1),
 (u'para quem', 1),
 (u'fun parties', 1),
 (u'few days', 1),
 (u'jay graves', 1),
 (u'interact', 1),
 (u'private league', 1),
 (u'lot', 1),
 (u'hollywood', 1),
 (u'checkout', 1),
 (u'public presentation', 1),
 (u'game model', 1),
 (u'fun things', 1),
 (u'south project', 1),
 (u'slides', 1),
 (u'freelancer', 1),
 (u'object oriented', 1),
 (u'sphinx', 1),
 (u'insights', 1),
 (u'scratchpad', 1),
 (u'initial release', 1),
 (u'rio', 1),
 (u'super models', 1),
 (u'presentation program', 1),
 (u'browser', 1),
 (u'debugging', 1),
 (u'positive reaction', 1),
 (u'initial development', 1),
 (u'business logic', 1),
 (u'representative locator', 1),
 (u'traditional fantasy', 1),
 (u'implementations', 1),
 (u'raw', 1),
 (u'absolute url', 1),
 (u'o tempo', 1),
 (u'technology', 1),
 (u'greenpeace', 1),
 (u'html css', 1),
 (u'pdftk', 1),
 (u'line test', 1),
 (u'nas', 1),
 (u'functionality', 1),
 (u'import user', 1),
 (u'sqlite3', 1),
 (u'webdesigner', 1),
 (u'server os', 1),
 (u'record time', 1),
 (u'quote', 1),
 (u'first installment', 1),
 (u'test automation conference', 1),
 (u'sys', 1),
 (u'fantasy game', 1),
 (u'pool', 1),
 (u'first name last name', 1),
 (u'design patterns', 1),
 (u'modes', 1),
 (u'driven development', 1),
 (u'os system', 1),
 (u'databases', 1),
 (u'output variables', 1),
 (u'cookbook', 1)
]

On Jul 13, 7:49 pm, Imri Goldberg <lorgan... at gmail.com> wrote:
> My shneckel:
> 1. Have a simple cull list (take the 5 minutes to write it, and it will do
> 80% of the work)2. Use TF/IDF
>
>
>
> On Mon, Jul 13, 2009 at 7:02 PM, Refael <ref... at gmail.com> wrote:
>
> > I've run the data trough Whoosh, and now the hardest part is to cull
> > the words.
> > For example these are the top 10 word counts:
> > (u'django', 15051),
> > (u'have', 4066),
> > (u'your', 3770),
> > (u'us', 3311),
> > (u'python', 2738),
> > (u'some', 2713),
> > (u'site', 2501),
> > (u'code', 2359),
> > (u'like', 2335),
> > (u'project', 2327),
>
> > Any ideas how to sort out relevant tags?
>
> > On Jun 25, 4:36 pm, benny daon <bennyd... at gmail.com> wrote:
> > > Hi all,I've got a project going with the aim of improving
> > djangoproject.com.
> > > So far I've forked the original code, cleaned it up, added buildout so
> > > installation will be a breeze, and added django-south so we can easily
> > > upgrade the database.
> > > Jacob KM sent me a link to a dump of the current database which I
> > included
> > > in the migration script so the code pulls the dump and use it to create
> > the
> > > database and add all the rows. There are almost 5000 rows in the model,
> > > pointing to django related posts. The next step is to extract common tags
> > > from  the title and summary fields of the FeedItem.
> > > A friend recommended I use Solr or Lucene for this job which makes sense.
> > My
> > > issue is that I never used them before. If you know what needs to be done
> > > and have some time, please assign this ticket -
> >http://bitbucket.org/daonb/django-website/issue/3/-to yourself, fork the
> > > code, do it, and send me a 'pull request'.
>
> > > Thanks,
>
> > > Benny.
>
> > > BTW - there's much more to do in this project. Please feel free to open
> > > tickets with suggestions/bugs or better yet - send code. Jacob said he
> > will
> > > use it in the live site.
>
> --
> Imri Goldberg
> --------------------------------------www.algorithm.co.il/blogs/
> --------------------------------------
> -- insert signature here ----
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "PyWeb-IL" group.
To post to this group, send email to pyweb-il at googlegroups.com
To unsubscribe from this group, send email to pyweb-il+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/pyweb-il?hl=en
-~----------~----~----~----~------~----~------~--~---



More information about the Python-il mailing list