Repeatable Sections
Added ability to handle repeating sections.
Hi russ1985, I am going to be taking charge of approving pull requests. I'll be merging in my changes in the next couple of days, which might generate a couple of minor conflicts for you. Sorry, hope it's nothing too bad. Also, could you add a couple tests? We are using Rspec. This ensures that everything works, as well as provides everyone with examples of how to use your features.
Yea, after I did the pull request I just thought "shouldn't I have added some specs". Sorry about that, will add some. I added a section in the Readme on how to use the feature.
On 8/6/12 4:36 PM, Scott Berkley wrote:
Hi russ1985, I am going to be taking charge of approving pull requests. I'll be merging in my changes in the next couple of days, which might generate a couple of minor conflicts for you. Sorry, hope it's nothing too bad. Also, could you add a couple tests? We are using Rspec. This ensures that everything works, as well as provides everyone with examples of how to use your features.
— Reply to this email directly or view it on GitHub https://github.com/ryanwood/slither/pull/10#issuecomment-7534983.
I guess I'm also a bit confused about what the purpose of this change is. The code currently allows repeated sections. Is it important for your application that they get put into separate namespaces in the resulting parse object?
I must have missed that, from what I saw it did not support repeating sections. How do you specify repeating sections?
On 8/7/12 1:07 PM, Scott Berkley wrote:
I guess I'm also a bit confused about what the purpose of this change is. The code currently allows repeated sections. Is it important for your application that they get put into separate namespaces in the resulting parse object?
— Reply to this email directly or view it on GitHub https://github.com/ryanwood/slither/pull/10#issuecomment-7559950.
parser.rb line 8
# This may be used in the future for non-linear or repeating sections
@mode = :linear
No special specification is needed. Sections are assigned by the 'trap' statement. So I can define my header section as a line that starts with 'HEAD'. Now if I have this file:
HEAD1234
body 747
more 787
HEAD5678
I will have two items in the header section, even though they don't come sequentially. They just get added to the array for that section, the same way that body lines do.
I'm not really sure what the intention of that mode comment is. Give it a try and let me know if it doesn't do what you need it to.
I might be confused then.. from what I see in the code it reads the sections linearly so if you have a definition of
# Create a Slither::Defintion to describe a file format
Slither.define :simple do |d|
d.header do |header|
header.trap { |line| line[0,1] == 1 }
header.column :header_begin, 1, :type => :integer
header.column :batch_number, 3, :type => :integer
end
d.data do |data|
data.trap { |line| line[0,1] == 2 }
data.column :data_begin, 1, :type => :integer
data.column :record_number, 3, :type => :integer
data.column :record_number_plus_batch, 6, :type => :integer
data.column :id, 2, :type => :integer
data.column :name, 10, :align => :left
end
d.tail_record do |tail_record|
tail_record.trap { |line| line[0,1] == 3 }
tail_record.column :tail_record_begin, 1, :type => :integer
tail_record.column :num_records, 3, :type => :integer
end
d.footer do |footer|
footer.trap { |line| line[0,1] == 4 }
footer.column :footer_record_begin, 1, :type => :integer
footer.column :total_record_count, 3, :type => :integer
footer.column :batch_number, 3, :type => :integer
end
end
and a file of
1001
200100100101Russell
200200200102Jim
200300300103Bill
303
200100100104Tim
200200200105Josh
3002
4005001
It would raise
Slither::RequiredSectionNotFoundError
because it would find another data record instead of a footer record.
make sense? I hope I am not confusing you more. I only added my repeatable sections because I ran into the problem above and kept getting
Slither::RequiredSectionNotFoundError
It might seem odd, but I have some requirements for something like that.
Oh, you're right, sorry. It turns out that we added repeatable sections in my change without my realizing it. We only deal with one section, so these issues are not on the top of my mind. My coworker changed how lines get added to sections to support parsing from IO objects, then I changed how the required section validation works because I saw that it didn't make sense any longer. It now parses all lines into sections, then checks to make sure all required sections have at least one row at the end. So the order of definition doesn't matter.
I did the merge on my stuff, so try pulling the latest and see if it works for you. If you have sections of different lengths like in your example, you'll have to change the definition to Slither.define :simple, :by_bytes => false do |d|. Sorry for the confusion.
I gave your example a try and might have fixed some things that were causing it to fail.
Slither.define :repeat, :by_bytes => false do |d|
d.header do |header|
header.trap { |line| line[0] == '1' }
header.column :header_begin, 1, :type => :integer
header.column :batch_number, 3, :type => :integer
end
d.data do |data|
data.trap { |line| line[0] == '2' }
data.column :data_begin, 1, :type => :integer
data.column :record_number, 3, :type => :integer
data.column :record_number_plus_batch, 6, :type => :integer
data.column :id, 2, :type => :integer
data.column :name, 10, :align => :left
end
d.tail_record do |tail_record|
tail_record.trap { |line| line[0] == '3' }
tail_record.column :tail_record_begin, 1, :type => :integer
tail_record.column :num_records, 3, :type => :integer
end
d.footer do |footer|
footer.trap { |line| line[0] == '4' }
footer.column :footer_record_begin, 1, :type => :integer
footer.column :total_record_count, 3, :type => :integer
footer.column :batch_number, 3, :type => :integer
end
end
Gives me
{:header=>[{:header_begin=>1, :batch_number=>1}], :data=>[{:data_begin=>2, :record_number=>1, :record_number_plus_batch=>1001, :id=>1, :name=>"Russell"}, {:data_begin=>2, :record_number=>2, :record_number_plus_batch=>2001, :id=>2, :name=>"Jim"}, {:data_begin=>2, :record_number=>3, :record_number_plus_batch=>3001, :id=>3, :name=>"Bill"}, {:data_begin=>2, :record_number=>1, :record_number_plus_batch=>1001, :id=>4, :name=>"Tim"}, {:data_begin=>2, :record_number=>2, :record_number_plus_batch=>2001, :id=>5, :name=>"Josh"}], :tail_record=>[{:tail_record_begin=>3, :num_records=>3}, {:tail_record_begin=>3, :num_records=>2}], :footer=>[{:footer_record_begin=>4, :total_record_count=>5, :batch_number=>1}]}
Note that we also added length validation, so make sure each line is padded with spaces to make it the right length. Does this output work for you?
That looks good to me, one question though, how will it write the file out? Wit it write the data pieces all together? Or will it separate them like in my example?
Thanks!
On 8/8/12 10:05 AM, Scott Berkley wrote:
I gave your example a try and might have fixed some things that were causing it to fail.
|Slither.define :repeat, :by_bytes => false do |d| d.header do |header| header.trap { |line| line[0] == '1' } header.column :header_begin, 1, :type => :integer header.column :batch_number, 3, :type => :integer end d.data do |data| data.trap { |line| line[0] == '2' } data.column :data_begin, 1, :type => :integer data.column :record_number, 3, :type => :integer data.column :record_number_plus_batch, 6, :type => :integer data.column :id, 2, :type => :integer data.column :name, 10, :align => :left end d.tail_record do |tail_record| tail_record.trap { |line| line[0] == '3' } tail_record.column :tail_record_begin, 1, :type => :integer tail_record.column :num_records, 3, :type => :integer end d.footer do |footer| footer.trap { |line| line[0] == '4' } footer.column :footer_record_begin, 1, :type => :integer footer.column :total_record_count, 3, :type => :integer footer.column :batch_number, 3, :type => :integer end end |
Gives me
{:header=>[{:header_begin=>1, :batch_number=>1}], :data=>[{:data_begin=>2, :record_number=>1, :record_number_plus_batch=>1001, :id=>1, :name=>"Russell"}, {:data_begin=>2, :record_number=>2, :record_number_plus_batch=>2001, :id=>2, :name=>"Jim"}, {:data_begin=>2, :record_number=>3, :record_number_plus_batch=>3001, :id=>3, :name=>"Bill"}, {:data_begin=>2, :record_number=>1, :record_number_plus_batch=>1001, :id=>4, :name=>"Tim"}, {:data_begin=>2, :record_number=>2, :record_number_plus_batch=>2001, :id=>5, :name=>"Josh"}], :tail_record=>[{:tail_record_begin=>3, :num_records=>3}, {:tail_record_begin=>3, :num_records=>2}], :footer=>[{:footer_record_begin=>4, :total_record_count=>5, :batch_number=>1}]}
Note that we also added length validation, so make sure each line is padded with spaces to make it the right length. Does this output work for you?
— Reply to this email directly or view it on GitHub https://github.com/ryanwood/slither/pull/10#issuecomment-7584434.
I'll let you test that out. I would guess it will write all headers, all data, all tails, and then all footers, so probably not what you want. If that's the case, we can figure out a solution.
I wrote some specs to test my changes, you can see them in my commit. I will merge your changes into mine and see if I can get it working with your changes.
thanks!
On Wed, Aug 8, 2012 at 2:22 PM, Scott Berkley [email protected]:
I'll let you test that out. I would guess it will write all headers, all data, all tails, and then all footers, so probably not what you want. If that's the case, we can figure out a solution.
— Reply to this email directly or view it on GitHubhttps://github.com/ryanwood/slither/pull/10#issuecomment-7592455.