iwork icon indicating copy to clipboard operation
iwork copied to clipboard

Can I use that library to convert .numbers format?

Open egorbarkovsky opened this issue 4 years ago • 14 comments

Hey! Good job, like it! But how can I use that to convert .numbers file?

egorbarkovsky avatar Nov 01 '20 21:11 egorbarkovsky

@dunhamsteve @adrianwojdat

egorbarkovsky avatar Nov 02 '20 11:11 egorbarkovsky

@egorbarkovsky Unfortunately this converter is working only with .pages

adrianwojdat avatar Nov 02 '20 14:11 adrianwojdat

It's been a while, but I recall that most of the internal structures were similar (the pages tables were essentially numbers tables), so I think this could be adapted fairly easily. I don't have a lot of spare time, but I can take a look.

dunhamsteve avatar Nov 02 '20 15:11 dunhamsteve

Thank you very much for your answer! It would be very kind of you, but if there is no time, perhaps you could at least suggest a direction, how can we adapt this library for numbers? It is possible to solve a simple conversion into .json format, not necessarily into .html

egorbarkovsky avatar Nov 03 '20 14:11 egorbarkovsky

@dunhamsteve

egorbarkovsky avatar Nov 04 '20 11:11 egorbarkovsky

I haven't been in this code in six years, so I needed to find my way around. But worth the time because I recently started a javascript project to pick apart these files.

There is a pages.go, an equivalent numbers.go need to be generated. Something like:

go run codegen/codegen.go codegen/Numbers.json > index/numbers.go

Rename decode to decodeNumbers, wire up index so you can choose whether to decodePages or decodeNumbers. There are a bunch of issues in the generated protobuf code, where things like TN1 need to be renamed to TN (just remove the numbers, global search/replace helps).

This work is in the numbersSupport branch.

At this point you should get a TN.DocumentArchive back for numbers files. You'll need to walk through that and pull out the spreadsheet. The existing table code should be reusable as-is if you were generating html or as a guide if you're generating JSON. I see a note that I put in there that there some tweaks related to multiple tiles for spreadsheets. There may be a little discovery work to do with that.

dunhamsteve avatar Nov 04 '20 16:11 dunhamsteve

For almost a week I tried to figure it out with golang (I myself am a java developer), but, alas, not great successes :( I don’t understand how the system as a whole works, but this project is very important to me. Perhaps you need some software solution in java, which can I do it for you, in exchange for the implementation of the .numbers -> .json converter, or is there any other way to implement it?

Thank you anyway.

@dunhamsteve

egorbarkovsky avatar Nov 11 '20 20:11 egorbarkovsky

I played around with it a little tonight and found that the some of the internal format of CellStorageBuffer had changed since iWork'13 (even Pages tables were broken). I've got some basic translation from numbers to html working, but it's still preliminary. I suspect that additional work would be needed for large tables, and I didn't code support for boolean cells (or any interpretation of styles or formatting).

dunhamsteve avatar Nov 12 '20 06:11 dunhamsteve

well I got you. If you need anything, any help in any format, text to me now I'll try to convert numbers

egorbarkovsky avatar Nov 12 '20 20:11 egorbarkovsky

almost finished dealing with protobuf, there is only one question left: where can I get the latest version of files like (KNArchives.proto, TNArchives.proto, etc.) to ./protoc them in the required form? @dunhamsteve

egorbarkovsky avatar Dec 07 '20 02:12 egorbarkovsky

I recently started to update that stuff, pulling out the protoc files again. Then I got lost in the weeds trying to update that mapping file (from integers to protobuf message type) and then got sidetracked. The newer proto files are here: proto.zip They haven't been incorporated into this repository yet.

I'll eventually update this repository and finish figuring out the binary details of the binary cell data.

The files in the attached zip archive were generated by some javascript code I wrote (or rather ported from some python code that I wrote). It scans the executable and Frameworks in the iWork apps for embedded protobuf data encoding the schema, deserializes it into objects, and then walks through the schema writing .proto files.

I believe the original proto files that I used came from iWorkFileInspector, or I at least used the proto-dump utility to generate them. The ones in the attached zip file have been tweaked a little to avoid a circular dependency issue in the generated Go code. If the attached proto files don't work, you might try those, but they're from a pretty old version of iWork.

dunhamsteve avatar Dec 07 '20 04:12 dunhamsteve

Ok, saw it, thank's a lot! Can you also tell me what version of protobuf is in this proto.zip?

egorbarkovsky avatar Dec 08 '20 01:12 egorbarkovsky

@dunhamsteve sorry for abuse :)

egorbarkovsky avatar Dec 08 '20 15:12 egorbarkovsky

The files declare syntax as proto2. It should work a recent version of protoc. The older proto files that I had lying around generated code that couldn't be compiled when I ran it through a modern protoc.

dunhamsteve avatar Dec 09 '20 15:12 dunhamsteve