Reports

A report is the ultimate thing we’re after. A report is an abstraction over an entire file. Important assumptions for the library to work and be usable are as follows:

  • The file is fixed size per line
  • Each individual line can be inetpreted as a record

Also, you should take of encoding, since this may cause problems, if you badly interpret unicode in ascii string for example. In that case, even if your positions look right, you may get ValidationError errors.

Warning

Be sure that you are correctly loading the file with proper encoding!

A report is a custom class, that uses Record classes for it’s assembly.

Basic Construction

Okay, so now you have a export file that you want to parse. Let’s use our example with hex nad header records from the previous chapter, and build a parser for this “file”:

140931123225tomas.plesek
#12aacc bluish

For this, we create a report:

from pybankreader import reports

class ColorReport(reports.Report):
    header = HexNameRecord()
    color = ColorRecord()

Make sure that you’re instantiating objects in the record definitions! Otherwise, all hell will break loose

Now, when you’re ready, you can load your data. Let’s presume that our file is named color_file.txt:

with open('color_file.txt', 'r') as fl:
    report = ColorReport(fl)

    print report.header.user_name
    print report.header.export_date
    print report.color.color_name

... which produces this output:

"tomas.plesek"
datetime.datetime(2014, 9, 31, 12, 32, 25)
"bluish"

Easy. Now, what if we would have our original file like this?:

140931123225tomas.plesek
#12aacc bluish
#e50f2c redish
#6ff660 greenish

Our report will still work, but you will get the same results as last time, with the difference that the color would be the last record line (‘greenish’), since it’s the last in the file. So how do we get around that? We use pybankreader.reports.CompoundRecord:

class ColorReport(reports.Report)
    header = HexNameRecord()
    data = CompoundRecord(ColorRecord)

Now, with this, you can now access your data like this:

report = ColorReport(fl)
print report.header.user_name

for color in report.data:
    print color.color_name

... which produces this outout:

>>> "tomas.plesek"
>>> "bluish"
>>> "redish"
>>> "greenish"

In this manner, you can also take care of situations, where you have multiple types of records, that are repeating:

140931123225tomas.plesek
#12aacc bluish
140931123225tomas.plesek
#e50f2c redish
#6ff660 greenish

In this case, the report would look like this:

class ColorReport(reports.Report)
    data = CompoundRecord(ColorRecord, HexNameRecord)

Now the library is able to pass everything as a sequence.

Validation Errors

Of course, you will hit a situation where either the data does not conform to your defined report, or you made a mistake when you constructed either the report or individual records.

In such a case, instance of pybankreader.exceptions.ValidationError will be thrown. To make the debugging easier, the validation does have a nice __str__ method, that will print a wider context for you to debug, like so:

ValidationError: header @ <0,3>: Value 'T26' does not match the regex pattern '079' for data: T263310  HEADER 0001.0000BBCSOB
[0] >>> AccountRecord/header @ <0,3>: Value 'T26' does not match the regex pattern '074'
[1] >>> ItemRecord/header @ <0,3>: Value 'T26' does not match the regex pattern '075'
[2] >>> ItemInfoRecord/header @ <0,3>: Value 'T26' does not match the regex pattern '076'
[3] >>> ItemRemittance1Record/header @ <0,3>: Value 'T26' does not match the regex pattern '078'
[4] >>> ItemRemittance2Record/header @ <0,3>: Value 'T26' does not match the regex pattern '079'

This should give you enough information to hunt down the problem. The first line is the last ValidationError that occured. The format is to be interpreted as such:

field_name @ <start_position,end_position>: 'Exception_message' for data: line_of_data_tried_to_be_loaded_into_a_record

You may stumble upon situations, as in our example, when there is a followup printout of successive validation errors. This is to get you to the underlying problem, because the system tries all record types in a report sequentially, until it gives up. So, if the problem is in the first record, the system will still complain about the last one, since that’s where it finally decided it cannot parse the source.

This stack is reset once every succesfull parsing of a record.

Note

The error message will be the last in the numbered stack trace, so in the example case, it’s number 4.

Advancement Hinting

There are rather unfortunate situations, when the library gets confused as to whether it’s on another record type. Imagine the situation, where you would have two records like this:

from pybankreader.records import Record
from pybankreader import fields

class CharRecord(Record):
    name = fields.CharField(length=10, required=True)


class FooterRecord(Record)
    footer = fields.RegexField(length=10, required=True, regex="AAAAZZAAAA")

Now you create a report out of these like it’s obvious:

class MyReport(Report):

    name = CharRecord()
    footer = FooterRecord()

And you try to read this file:

john
AAAAZZAAAAA

What happens? You will have the string``AAAAZZAAAA` in the report.name.name field and the footer will not have been loaded. Why? Because the footer is parsed by the CharRecord, since it fits within it’s constraints. To go around this, you have two options. Either update your recrods such that they’re more strict, or you can use so called “advancement hinting”.

Each report has set of default methods named hint_<record> that return always True. So in your example, there are two methods automatically defined for you:

  • hint_name(self, line)
  • hint_footer(self, line)

Now, whenever such method would return false, it will tell the library to stop processing the current line as given record, and try the next one. Note that the method receives single line parameter. This is the raw string read from the source file. In our example, we would solve the problem by overriding the hint_name method, like this:

def hint_name(self, line):
    return False if line == "AAAAZZAAAA" else True

And now the report will get parsed successfully.

Custom Processing

The last nice feature of pybankreader is the ability to custom-process data as they’re being parsed. This way, you can build complex parsed structures in memory if you want to.

The best example would we a situation, where your data is either hierarchical (yet presented in a linear fashion as multiple records), or multi-line. You will still represent each line “type” as an individual record, but you have the option to change, how the data is saved.

First, in a similar vein as Validation Errors, there is a set of default methods called process_<record>. What these do is that they take a parsed record and return it, nothing more. You are free to override those methods and change the behavior. You can obviously do whatever you need with the processed record, and you can either return an object (or the record itself), if you wish it to be loaded in the report.record field, or you may return None and therefore, the record will not be saved in the report.

So to go with an example using our colors, let’s have a file like this:

140931123225tomas.plesek
#12aacc bluish
#e50f2c redish

Suppose now that those two colors are not single colors, but they represent a gradient together. How do we create a report for this?:

class GradientReport(Report):

    header = HexNameRecord()
    data = CompoundRecord(ColorRecord)

    ticktock = True
    """
    This is a custom field. Since it's not a record, the libary will
    leave it alone
    """

    def process_data(self, record)
        """
        Just a stupid method of how to populate a custom class. Note
        that we're returning that custom class, not the ColorRecord!
        """
        if ticktock:
            gradient = Gradient()
            gradient.start = record
            ticktock = False
            return gradient
        else:
            self.data[-1].end = record
            ticktock = True
            return None

class Gradient(object):

    start = None
    end = None

    def __str__(self):
        print("{} -> {}".format(
            self.start.hex_color, self.end_hex_color
        )

Okay, and now if you do this:

report = GradientReport(file_like)
for x in data:
    print(x)
    print(type(x))

You will get:

"#12aacc -> #e50f2c"
<class pybankreader.examples.Gradient>

Neat, huh?