MarkersExtractor icon indicating copy to clipboard operation
MarkersExtractor copied to clipboard

Profiles & Formats (Excel Spreadsheet with Images)

Open IAmVigneswaran opened this issue 1 year ago • 16 comments

Request from Marc Bach -

He asks if it would be possible to generate an excel file (.xlsx) with images.

Found couple of excel related libraries

https://github.com/damuellen/xlsxwriter.swift https://github.com/CoreOffice/CoreXLSX

IAmVigneswaran avatar May 21 '24 02:05 IAmVigneswaran

Probably. Although Excel already natively imports CSV and TSV. Is an actual xlsx file crucial to the workflow?

CoreXLSX provides read-only support. xlsxwriter.swift could have potential.

orchetect avatar May 21 '24 02:05 orchetect

Is an actual xlsx file crucial to the workflow?

It was Marc Bach's request. I don't use excel in particular. Maybe he can reply to this thread when he is available.

IAmVigneswaran avatar May 21 '24 02:05 IAmVigneswaran

Excel is always a funny thing. It's not really a data interchange format, but over the years it's somewhat become one - mostly because it's familiar and some platforms have started supporting it (Google Sheets? stuff like that). So I get it. It might be worth looking into if it's fairly easy to implement.

orchetect avatar May 21 '24 03:05 orchetect

I checked out some open-source libraries and have a basic proof-of-concept, so it can definitely work.

I can probably throw together an Excel profile without much trouble.

orchetect avatar May 21 '24 03:05 orchetect

I can probably throw together an Excel profile without much trouble.

I believe embedding of images to each respective row cell would be a challenge?

But as a basic option, we can provide .xlsx format. Improve and refine it further in future versions if it is feasible.

One a side note -

It is much easier to embed images in Apple Numbers as opposed to Excel.

Apple Numbers

IAmVigneswaran avatar May 21 '24 03:05 IAmVigneswaran

New Excel profile is added. Will be in 0.3.8.

  • At this point, it's basically a carbon copy of the CSV content with no special formatting other than using bold text for the header row.
  • All cells are of "General" Excel cell format type, which is basically raw text values.
  • Generates thumbnail files on disk same as CSV profile.

Not sure if it's possible with the library I'm using to resize (or auto-resize) columns, or include images. I'll see if it's possible.

xlsx

orchetect avatar May 21 '24 04:05 orchetect

There's no way to auto-size columns to their contents with the XLSX library. (And in fact AFAIK there's no way to write to the XLSX format, regardless of library, to tell Excel to auto-size columns - that's an operation the user must perform within Excel).

Column widths can either be left as default (the narrow uniform widths in the screenshot above) or set explicit width(s) for columns.

I was able to come up with a calculation that analyzes the table's data and sets column widths to be a little larger than their contents, up to a sensible maximum width.

col-size

orchetect avatar May 21 '24 04:05 orchetect

Embedded image support may not be viable. There's no support in the XLSX library for any image-related operations. And images are considered floating objects on a worksheet -- it's not really possible to embed an image in a single cell.

orchetect avatar May 21 '24 05:05 orchetect

Embedded image support may not be viable. There's no support in the XLSX library for any image-related operations. And images are considered floating objects on a worksheet -- it's not really possible to embed an image in a single cell.

Noted! Basic excel file is a good as it gets.

If users want to want to embed images, they can easily use Apple Numbers App.

IAmVigneswaran avatar May 21 '24 05:05 IAmVigneswaran

Agreed.

There isn't much more I would do to improve the Excel output at this stage.

  • None of the fields are number or time types that would make sense, so all values are General text. Excel doesn't understand timecode so you definitely don't want to have cells formatted as Date/Time - would result in data loss at worst, or confusion at best.
  • No point doing any fancy formatting font/style wise, as users will not care or would want to apply their own formatting any way.

As far as Numbers files, I'm not sure if the file format is open or even something we could output. And it's a lot less in-demand than Excel. (But yeah, for a lot of things I do love using Numbers more than Excel.)

orchetect avatar May 21 '24 05:05 orchetect

FWIW - I'm pretty sure there's some build-in macOS Terminal commands for converting a CSV file to an Excel Doc, Numbers Doc, etc. so you might just be able to output a CSV and convert into another format?

latenitefilms avatar May 21 '24 06:05 latenitefilms

built-in macOS Terminal commands

News to me if so! If we ever find them, they could be listed in the README for reference.

But Excel output is now implemented with a static library and it works at least. And might be less brittle over time if it's not relying on an external system tool that isn't under our control and Apple could remove or mangle at any time.

orchetect avatar May 21 '24 18:05 orchetect

The XLSX request comes from the necessity of creating One Line Continuity reports. Those are spreadsheets where you have a representative frame grab of the scene with the scene name a synopsis of it that should fit in a line (or 2). It's used in scripted and documentary editing to quickly get a grasp of the flow of the scenes and check for any continuity conflicts (like "it's supposed to be daytime", "clothes don't match" or "I can't have this scene before that one because I'm missing info"). CSV doesn't have a solid image integration so using XLSX was a possible solution. (I could be interesting to create a Notion template for them).

Marcplanb avatar May 22 '24 15:05 Marcplanb

The XLSX request comes from the necessity of creating One Line Continuity reports.

Interesting - that's useful info.

Incorporating images in the Excel sheet is a level of complexity I'm not sure is easily achievable at this time. I'd need to find an open-source Swift XLSX file write library that has image operations - and from early searches there's not much out there. Excel is a very archaic application with how it structures its documents (in order to maintain decades of backwards compatibility with millions of systems out there) so it's not fun to deal with.

I'm almost wondering if it's worth investigating if a VBScript within Excel could be used to take each row's image filename and then import/resize/position each image into the document, as a one-time operation.

If Excel is not critical and it's not important for the data to be mutable, other file formats could be considered. I would suggest PDF and/or HTML output as an additional export profile for MarkersExtractor. That way it's easy to integrate tables and images into a single document. Initially the layout and formatting may be dictated by the tool, but eventually custom formatting templates might be possible.

orchetect avatar May 22 '24 18:05 orchetect

PDF and HTML could work fine.

Marcplanb avatar May 23 '24 00:05 Marcplanb

I'm almost wondering if it's worth investigating if a VBScript within Excel could be used to take each row's image filename and then import/resize/position each image into the document, as a one-time operation.

@Marcplanb

https://github.com/TheAcharya/MarkersExtractor/assets/118706051/c9c6d960-6ae7-496f-adc3-150f90e2de87

With the aid of ChatGPT, I have written a VBScript to automate importing images row by row based on Image Filename column.

Here is the script -

Sub InsertImages()
    Dim ws As Worksheet
    Dim imgFilename As String
    Dim imgFullPath As String
    Dim img As Shape
    Dim cell As Range
    Dim lastRow As Long
    Dim originalWidth As Single
    Dim originalHeight As Single
    Dim newWidth As Single
    Dim newHeight As Single
    Dim maxWidth As Single
    Dim maxHeight As Single
    Dim ratio As Single

    ' Set the maximum width and height for the inserted images
    maxWidth = 100 ' Adjust as needed
    maxHeight = 100 ' Adjust as needed

    ' Set the worksheet to the first sheet in the workbook
    Set ws = ThisWorkbook.Sheets(1)

    ' Get the last row with data in the Image Filename column
    lastRow = ws.Cells(ws.Rows.Count, "V").End(xlUp).Row ' Assuming column V is the Image Filename column

    ' Loop through each cell in the Image Filename column
    For Each cell In ws.Range("V2:V" & lastRow) ' Adjust the range if your data starts on a different row
        imgFilename = cell.Value
        imgFullPath = ThisWorkbook.Path & "/" & imgFilename
        
        ' Check if the image file exists
        If Dir(imgFullPath) <> "" Then
            ' Insert image using Shapes.AddPicture
            Set img = ws.Shapes.AddPicture(imgFullPath, _
                                           msoFalse, msoCTrue, _
                                           cell.Offset(0, 2).Left, cell.Top, -1, -1)
            
            ' Get the original dimensions of the image
            originalWidth = img.Width
            originalHeight = img.Height
            
            ' Calculate the new dimensions while maintaining aspect ratio
            If originalWidth > originalHeight Then
                ratio = originalHeight / originalWidth
                newWidth = maxWidth
                newHeight = maxWidth * ratio
            Else
                ratio = originalWidth / originalHeight
                newHeight = maxHeight
                newWidth = maxHeight * ratio
            End If

            ' Resize the image
            img.LockAspectRatio = msoTrue
            img.Width = newWidth
            img.Height = newHeight
            
            ' Adjust the row height to fit the image
            cell.EntireRow.RowHeight = newHeight
        Else
            ' If the file does not exist, you could add an error message or handle it differently
            cell.Offset(0, 2).Value = "File not found"
        End If
    Next cell

    MsgBox "Images have been inserted and row heights adjusted.", vbInformation
End Sub

Feel free to expand and tweak the script to your liking.

IAmVigneswaran avatar May 24 '24 01:05 IAmVigneswaran

@orchetect I think we can safely release 0.3.8 with Excel Profile. 😀🙏

IAmVigneswaran avatar May 26 '24 00:05 IAmVigneswaran

With the aid of ChatGPT, I have written a VBScript to automate importing images

Clever. AI's taking our jobs for sure.

can safely release 0.3.8

Done!

orchetect avatar May 28 '24 01:05 orchetect

The XLSX library is kind of a house of cards. It works but, in short, we have to fork two libraries in order to make it compile.

This wasn't evident at the start because of how Xcode decides to throw compiler errors.

Basic explanation:

  • The XLSX library (A) is a Swift wrapper around another dependency (B).
    • A references B using a branch name because B is not using proper SemVer repo tags. So Swift Package Manager can't understand its version numbers. This means A is forced to reference B by branch name.
    • This compiles "ok" in-place because all dependencies are using branch names (unstable references).
  • While developing the MarkersExtractor package in isolation, no compiler issues presented because Xcode is building the package as-is, without reference to a version or branch of itself. It's just using the files on disk to compile.
  • However, when an app or package attempts to use MarkersExtractor as a dependency, we almost always use a stable version. ie: "from 0.3.8". And not a branch name. Xcode refuses to compile this because there is a mix of stable and unstable dependency references. MarkersExtractor is referenced at 0.3.8 (stable) but internally, MarkersExtractor is referencing the XLSX library by branch (unstable).

"packages which use branch-based dependency requirements can't be added as dependencies to packages that use version-based dependency requirements; you should remove branch-based dependency requirements before publishing a version of your package."

Apple docs

Solutions:

  • The cleanest way to a solution is to have both libraries A and B use proper SemVer tags on their repos, and the Package.swift files all reference stable version numbers. However we don't have control over those repos and convincing the repo maintainers to do so may not be possible.
  • The next best way to a solution is to just fork both libraries and make the needed changes. The downside is that any time either of those libraries have new versions, some work is required to update our forks manually.

For the time being, I've opted for the manual forking approach in 0.3.9.

orchetect avatar May 28 '24 23:05 orchetect

Thanks for the detail explanation!

IAmVigneswaran avatar May 29 '24 00:05 IAmVigneswaran

Found another XLSX library. Although it is targeted for iOS I am not sure if there is any benefit in utilising this library. The repo seems to be somewhat regularly maintained. If it is possible, we can switch to this library as opposed to using the fork two libraries.

It also has support for adding images to cell.

https://github.com/3973770/SwiftXLSX

IAmVigneswaran avatar May 29 '24 21:05 IAmVigneswaran

Yeah, I'm open to alternative libraries. Just wanted to get it stable for the time being because I may not have a chance to come back to this for a few weeks.

orchetect avatar May 29 '24 21:05 orchetect