[Python-il] [python-il]location in file

Tal Einat taleinat at gmail.com
Mon May 24 19:57:38 IDT 2010


Perhaps regular expressions are overkill in this case? Regexps are versatile
but can be confusing and hard to read. I prefer to focus on the logic and
try to write it out in readable code instead of complex regexps.

lines = open("...", 'rU').read().splitlines()

# get the data in a specific section
section_header_index = lines.index("MultiProgPage_Data at c2 - SECTION HEADER")
section_raw_data_index = section_header_index +
lines[section_header_index:].index("RAW DATA:")
words = []
for line in lines[section_raw_data_index+1:]:
    line = line.strip() # ignore whitespace at beginning and end of lines
    if not line: # assume an empty line means the end of the raw data
        break
    words.extend(line.split())

Good Luck,
- Tal


On Mon, May 24, 2010 at 7:13 PM, Shai Berger <shai at platonix.com> wrote:

> I would ignore the number of words, and focus on headers. With the headers,
> we
> specify the part of the text we want; we use a capturing group to pick out
> only the interesting part.
>
> section_header = "MultiProgPage_Data at c2 - SECTION HEADER"
> next_section_header = "beeper_FW_bg_Task at c2 - SECTION HEADER"
> part_header = "RAW DATA:"
>
> pattern = "%s.*%s(.*)%s" % (section_header, part_header,
> next_section_header)
>
> Then, just extract your section
>
> match = re.search(pattern, txt, re.M)
> if match:
>        section = match.group(1)
>
>        words = section.split()
>        del words[-1] # This is the '49.'  of the next header
>
>
> You might find a reading of http://docs.python.org/library/re.html, top to
> bottom, worthwhile.
>
> Have fun,
>         Shai.
>
>
> On Monday 24 May 2010 18:58:54 Yitzhak Wiener wrote:
> > Hi Shai,
> >
> > Thanks for the prompt reply.
> > As a really beginner, I think I partly understand your idea, but I don't
> > know how to do it. Can help with this? Assuming I prefer the first
> > option to search on the entire file, I would start as follows:
> > txt = file(r" project_release.dump").read()
> > #now I should find the next x hexadecimal words (x value is known) that
> > start after the string "RAW DATA:" in section that starts with "
> > MultiProgPage_Data at c2 - SECTION HEADER".
> > How do I do that?
> >
> >
> >
> > Thanks,
> > Yitzhak
> >
> >
> > -----Original Message-----
> > From: python-il-bounces at hamakor.org.il
> > [mailto:python-il-bounces at hamakor.org.il] On Behalf Of Shai Berger
> > Sent: Monday, May 24, 2010 6:34 PM
> > To: python-il at hamakor.org.il
> > Subject: Re: [Python-il] [python-il]location in file
> >
> > Hi Yitzhak,
> >
> > You said,
> >
> > > I am searching for data in file. The file is from type of text. I was
> > > using RE for finding the location in file that I was interested in.
> >
> > but in the code, you wrote,
> >
> > > for line in s:
> >
> > [...]
> >
> > >    if re.match(r".*RAW DATA.*", line):
> >
> > That is, instead of finding the location in the FILE, you found the
> > location
> > in the LINE.
> >
> > What you should do instead is get a string that contains your whole
> > section;
> > you can do this with regular expressions (applied to the whole file, s,
> > with
> > re.M), or you can do this by collecting the relevant lines after having
> > split
> > the file into lines. Then, just use section.split() to get a list of the
> >
> > "words" (as separated by whitespace) in the section.
> >
> > Have fun,
> >       Shai.
> > _______________________________________________
> > Python-il mailing list
> > Python-il at hamakor.org.il
> > http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
> >
> > ______________________________________________________________________
> > DSP Group, Inc. automatically scans all emails and attachments using
> > MessageLabs Email Security System.
> > _____________________________________________________________________
> >
> > ______________________________________________________________________
> > DSP Group, Inc. automatically scans all emails and attachments using
> >  MessageLabs Email Security System.
> >  _____________________________________________________________________
> >  _______________________________________________
> > Python-il mailing list
> > Python-il at hamakor.org.il
> > http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
> >
> _______________________________________________
> Python-il mailing list
> Python-il at hamakor.org.il
> http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://hamakor.org.il/pipermail/python-il/attachments/20100524/0513be03/attachment.htm 


More information about the Python-il mailing list