[Python-il] [pyweb-il:326] Re: Need help with django-website

Udi h Bauman dibaunaumh at gmail.com
Wed Jul 29 15:34:56 IDT 2009


I was going to suggest YQL term-extraction, which is quite good. But
be sure to update on today's news regarding Y! & MS, which makes any
usage of Yahoo!'s API's a very risky bet.


Udi


On 7/29/09, Refael <refack at gmail.com> wrote:
>
> I think I have a jackpot
>
> Using Yahoo Term extractor on a random 20 articles I get:
> [(u'django', 14),
>  (u'google', 4),
>  (u'python', 3),
>  (u'oxford', 2),
>  (u'ruby on rails', 2),
>  (u'migrations', 2),
>  (u'apps', 2),
>  (u'iteration', 2),
>  (u'snippets', 2),
>  (u'models', 2),
>  (u'running', 2),
>  (u'google maps', 2),
>  (u'unit tests', 1),
>  (u'geek night', 1),
>  (u'ali', 1),
>  (u'dba', 1),
>  (u'celebrities', 1),
>  (u'data models', 1),
>  (u'vagas para', 1),
>  (u'admin interface', 1),
>  (u'nbsp', 1),
>  (u'password xxx', 1),
>  (u'internet explorer', 1),
>  (u'volta', 1),
>  (u's\xe3o paulo', 1),
>  (u'long time', 1),
>  (u'larson', 1),
>  (u'staging', 1),
>  (u'capabilities', 1),
>  (u'blog', 1),
>  (u'pra valer', 1),
>  (u'dict', 1),
>  (u'search software', 1),
>  (u'advice', 1),
>  (u'interactive map', 1),
>  (u'crash', 1),
>  (u'banco de dados', 1),
>  (u'keyword arguments', 1),
>  (u'export library', 1),
>  (u'core management', 1),
>  (u'fantasy sport', 1),
>  (u'submission', 1),
>  (u'foi', 1),
>  (u'html javascript', 1),
>  (u'last time', 1),
>  (u'cms', 1),
>  (u'database name', 1),
>  (u'enthusiasts', 1),
>  (u'map', 1),
>  (u'cairo', 1),
>  (u'creation', 1),
>  (u'sync', 1),
>  (u'meta', 1),
>  (u'sem', 1),
>  (u'inkscape', 1),
>  (u'pylons', 1),
>  (u'pdf export', 1),
>  (u'abc', 1),
>  (u'install software', 1),
>  (u'exit 1', 1),
>  (u'uma', 1),
>  (u'irc', 1),
>  (u'dias', 1),
>  (u'exercise', 1),
>  (u'best project', 1),
>  (u'time one', 1),
>  (u'reason', 1),
>  (u'interface', 1),
>  (u'webapp', 1),
>  (u'bottom line', 1),
>  (u'database engine', 1),
>  (u'friends houses', 1),
>  (u'looking at the environment', 1),
>  (u'launch', 1),
>  (u'content types', 1),
>  (u'ajax', 1),
>  (u'discussion groups', 1),
>  (u'new game', 1),
>  (u'new features', 1),
>  (u'aptitude', 1),
>  (u'para quem', 1),
>  (u'fun parties', 1),
>  (u'few days', 1),
>  (u'jay graves', 1),
>  (u'interact', 1),
>  (u'private league', 1),
>  (u'lot', 1),
>  (u'hollywood', 1),
>  (u'checkout', 1),
>  (u'public presentation', 1),
>  (u'game model', 1),
>  (u'fun things', 1),
>  (u'south project', 1),
>  (u'slides', 1),
>  (u'freelancer', 1),
>  (u'object oriented', 1),
>  (u'sphinx', 1),
>  (u'insights', 1),
>  (u'scratchpad', 1),
>  (u'initial release', 1),
>  (u'rio', 1),
>  (u'super models', 1),
>  (u'presentation program', 1),
>  (u'browser', 1),
>  (u'debugging', 1),
>  (u'positive reaction', 1),
>  (u'initial development', 1),
>  (u'business logic', 1),
>  (u'representative locator', 1),
>  (u'traditional fantasy', 1),
>  (u'implementations', 1),
>  (u'raw', 1),
>  (u'absolute url', 1),
>  (u'o tempo', 1),
>  (u'technology', 1),
>  (u'greenpeace', 1),
>  (u'html css', 1),
>  (u'pdftk', 1),
>  (u'line test', 1),
>  (u'nas', 1),
>  (u'functionality', 1),
>  (u'import user', 1),
>  (u'sqlite3', 1),
>  (u'webdesigner', 1),
>  (u'server os', 1),
>  (u'record time', 1),
>  (u'quote', 1),
>  (u'first installment', 1),
>  (u'test automation conference', 1),
>  (u'sys', 1),
>  (u'fantasy game', 1),
>  (u'pool', 1),
>  (u'first name last name', 1),
>  (u'design patterns', 1),
>  (u'modes', 1),
>  (u'driven development', 1),
>  (u'os system', 1),
>  (u'databases', 1),
>  (u'output variables', 1),
>  (u'cookbook', 1)
> ]
>
> On Jul 13, 7:49 pm, Imri Goldberg <lorgan... at gmail.com> wrote:
>> My shneckel:
>> 1. Have a simple cull list (take the 5 minutes to write it, and it will do
>> 80% of the work)2. Use TF/IDF
>>
>>
>>
>> On Mon, Jul 13, 2009 at 7:02 PM, Refael <ref... at gmail.com> wrote:
>>
>> > I've run the data trough Whoosh, and now the hardest part is to cull
>> > the words.
>> > For example these are the top 10 word counts:
>> > (u'django', 15051),
>> > (u'have', 4066),
>> > (u'your', 3770),
>> > (u'us', 3311),
>> > (u'python', 2738),
>> > (u'some', 2713),
>> > (u'site', 2501),
>> > (u'code', 2359),
>> > (u'like', 2335),
>> > (u'project', 2327),
>>
>> > Any ideas how to sort out relevant tags?
>>
>> > On Jun 25, 4:36 pm, benny daon <bennyd... at gmail.com> wrote:
>> > > Hi all,I've got a project going with the aim of improving
>> > djangoproject.com.
>> > > So far I've forked the original code, cleaned it up, added buildout so
>> > > installation will be a breeze, and added django-south so we can easily
>> > > upgrade the database.
>> > > Jacob KM sent me a link to a dump of the current database which I
>> > included
>> > > in the migration script so the code pulls the dump and use it to
>> > > create
>> > the
>> > > database and add all the rows. There are almost 5000 rows in the
>> > > model,
>> > > pointing to django related posts. The next step is to extract common
>> > > tags
>> > > from  the title and summary fields of the FeedItem.
>> > > A friend recommended I use Solr or Lucene for this job which makes
>> > > sense.
>> > My
>> > > issue is that I never used them before. If you know what needs to be
>> > > done
>> > > and have some time, please assign this ticket -
>> >http://bitbucket.org/daonb/django-website/issue/3/-to yourself, fork the
>> > > code, do it, and send me a 'pull request'.
>>
>> > > Thanks,
>>
>> > > Benny.
>>
>> > > BTW - there's much more to do in this project. Please feel free to
>> > > open
>> > > tickets with suggestions/bugs or better yet - send code. Jacob said he
>> > will
>> > > use it in the live site.
>>
>> --
>> Imri Goldberg
>> --------------------------------------www.algorithm.co.il/blogs/
>> --------------------------------------
>> -- insert signature here ----
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "PyWeb-IL" group.
To post to this group, send email to pyweb-il at googlegroups.com
To unsubscribe from this group, send email to pyweb-il+unsubscribe at googlegroups.com
For more options, visit this group at http://groups.google.com/group/pyweb-il?hl=en
-~----------~----~----~----~------~----~------~--~---



More information about the Python-il mailing list