No subject


Thu Jan 10 11:35:52 IST 2013


Apache - in it's default configuration - is not efficient in working with
heavy processes because it creates a new process for each request. There
are better setups like using gUnicorn or uWSGI that load n workers and
distribute the work between them (usually n = number of cores X 2 + 1).

More robust and scalable setup would include a separate workers that answer
to the NLTK requests asynchronously and django approaches these workers via
a message queue. This setup will allow you to put your NLTK workers even on
a separate machine without creating situation where your web server is
competing with your NLTK workers on limited resources (CPU and RAM).

Even if you will eventually find the way to configure apache to load NLTK
without crashing - the URL that handles NLTK requests would be a perfect
point to attack you server and to bring it into a DOS (denial of service)
situation using only a couple of strong machines approaching this URL....

I urge you to read a little bit about gEvent and Celery to understand what
I'm talking about.

HTH

--
Emanuel




On Thu, Jan 31, 2013 at 7:30 PM, asaf greenberg <asafgreenberg at gmail.com>wrote:

>
> i don't know enough django, but i worked with nltk.
> NLTK is a very heavy module, lagging on import is expected, especially if
> you're using certain modules.
>
> AFAIK you should `import' it only once, on server (re)start, and it costs
> about 10-30 secs (did you optimize with *pyc or *pyo?). unless you're short
> on RAM... but i hope that's not the case.
>
> NLTK has also many sub-modules, which can and should be disabled, for
> performance.
>
> Does it hang elsewhere (apart from server startup)?
> does it have a longer delay than 20-30 secs.?
>
>
>
> On 1/31/2013 6:44 PM, Avishalom Shalit wrote:
>
>    As title.
>
>  It just silently hangs.
>
>  as far as i found on google, other people have ran into it,
>  but nobody posted a solution.
>
>  anybody overcame this before ?
>
>  thanks
>
>
>  -- vish
>
>
>
> _______________________________________________
> Python-il mailing listPython-il at hamakor.org.ilhttp://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
>
>
>
> _______________________________________________
> Python-il mailing list
> Python-il at hamakor.org.il
> http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
>
>

--20cf3074b098bf05bf04d49e4285
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I don&#39;t know enough NLTK but I work with django :)<div=
><br></div><div style>From Asaf&#39;s description it looks like you have to=
 change your architecture. Apache - in it&#39;s default configuration - is =
not efficient in working with heavy processes because it creates a new proc=
ess for each request. There are better setups like using gUnicorn or uWSGI =
that load n workers and distribute the work between them (usually n =3D num=
ber of cores X 2 + 1).</div>
<div style><br></div><div style>More robust and scalable setup would includ=
e a separate workers that answer to the NLTK requests=A0asynchronously and =
django approaches these workers via a message queue. This setup will allow =
you to put your NLTK workers even on a separate machine without creating si=
tuation where your web server is competing with your NLTK workers on limite=
d resources (CPU and RAM).</div>
<div style><br></div><div style>Even if you will eventually find the way to=
 configure apache to load NLTK without crashing - the URL that handles NLTK=
 requests would be a perfect point to attack you server and to bring it int=
o a DOS (denial of service) situation using only a couple of strong machine=
s approaching this URL....</div>
<div style><br></div><div style>I urge you to read a little bit about gEven=
t and Celery to understand what I&#39;m talking about.</div><div style><br>=
</div><div style>HTH</div><div style><br></div><div style>--</div><div styl=
e>
Emanuel</div><div style><br></div><div style><br></div></div><div class=3D"=
gmail_extra"><br><br><div class=3D"gmail_quote">On Thu, Jan 31, 2013 at 7:3=
0 PM, asaf greenberg <span dir=3D"ltr">&lt;<a href=3D"mailto:asafgreenberg@=
gmail.com" target=3D"_blank">asafgreenberg at gmail.com</a>&gt;</span> wrote:<=
br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
 =20
   =20
 =20
  <div text=3D"#000000" bgcolor=3D"#FFFFFF">
    <div><br>
      i don&#39;t know enough django, but i worked with nltk.<br>
      NLTK is a very heavy module, lagging on import is expected,
      especially if you&#39;re using certain modules.<br>
      <br>
      AFAIK you should `import&#39; it only once, on server (re)start, and
      it costs about 10-30 secs (did you optimize with *pyc or *pyo?).
      unless you&#39;re short on RAM... but i hope that&#39;s not the case.=
<br>
      <br>
      NLTK has also many sub-modules, which can and should be disabled,
      for performance.<br>
      <br>
      Does it hang elsewhere (apart from server startup)?<br>
      does it have a longer delay than 20-30 secs.?<div><div class=3D"h5"><=
br>
      <br>
      <br>
      On 1/31/2013 6:44 PM, Avishalom Shalit wrote:<br>
    </div></div></div>
    <blockquote type=3D"cite"><div><div class=3D"h5">
      <div dir=3D"ltr">
        <div>
          <div>
            <div>
              <div>
                <div>
                  <div>As title. <br>
                    <br>
                  </div>
                  It just silently hangs. <br>
                  <br>
                </div>
              </div>
              as far as i found on google, other people have ran into
              it, <br>
            </div>
            but nobody posted a solution. <br>
            <br>
          </div>
          anybody overcame this before ?<br>
          <br>
        </div>
        thanks<br>
        <div>
          <div>
            <div>
              <div>
                <div>
                  <div><br>
                    <br clear=3D"all">
                    <div>
                      <div>
                        <div>-- vish<br>
                          <br>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
      <br>
      <fieldset></fieldset>
      <br>
      </div></div><pre>_______________________________________________
Python-il mailing list
<a href=3D"mailto:Python-il at hamakor.org.il" target=3D"_blank">Python-il at ham=
akor.org.il</a>
<a href=3D"http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il" target=
=3D"_blank">http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il</a>
</pre>
    </blockquote>
    <br>
  </div>

<br>_______________________________________________<br>
Python-il mailing list<br>
<a href=3D"mailto:Python-il at hamakor.org.il">Python-il at hamakor.org.il</a><br=
>
<a href=3D"http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il" target=
=3D"_blank">http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il</a><br=
>
<br></blockquote></div><br></div>

--20cf3074b098bf05bf04d49e4285--


More information about the Python-il mailing list