CodableCSV icon indicating copy to clipboard operation
CodableCSV copied to clipboard

Option to escape Excel-unfriendly strings

Open MrTemple opened this issue 3 years ago • 5 comments

Is your feature request related to a problem?

When a csv file contains a string with a leading -, +, or =, Excel will treat it as a formula field and throw an error.

Describe the solution you'd like

Love an option to auto-detect these leading characters and escape them properly with a single leading single-quote '.

MrTemple avatar Jun 25 '21 21:06 MrTemple

Hi @MrTemple,

Thank you for the suggestion. That is quite a neat feature. I have added tho the backlog. Currently, I am a bit busy with regular work, so I would encourage you to give it a try and implement it yourself, since I don't know when I can get to it, sorry.

In any case, thank you for the feedback.

dehesa avatar Jun 28 '21 05:06 dehesa

Thanks @dehesa.

Where do you suggest I start? Maybe in _lowlevelWrite(field: String), somewhere either in 3.A or between 3.A and 3.B? (This is used for the Encoder too, right?) Using an option created in Writer.Configuration and flowing through CSVWriter.settings?

And do you think I'd need to add another char to the result.reserveCapacity?

Also, I haven't contributed to an open source project yet. Is this the general process to follow? https://github.com/firstcontributions/first-contributions

MrTemple avatar Jun 29 '21 17:06 MrTemple

@MrTemple did you ever implement this? thank you

aehlke avatar Mar 01 '24 01:03 aehlke

@aehlke No, I didn't have time to dig under the hood and make the change the way I'd have liked. For my purposes it was faster to just extend String with an excelSafe property.

This has all the escaping that seems necessary for excel. I'm pulling many tens of thousands of very long, complexly-formatted, code-containing issue descriptions from various systems (Jira, Github, etc). These seem to have pretty much the ugliest content you're likely to run into, and this escaping logic seems to wrangle them properly without spilling out of the cell. (@dehesa, if you've got an idea where I can put this in CodableCSV, I could give it a whirl.)

Note about the character length, I settled on 1000 kind of randomly because it worked on my version of excel, and I didn't happen to need the extra info in the fields I was pulling from. You may want to experiment with a higher limit.

Also the excelNumSafeHashPrefixed is just an uber hack to prevent excel monkeying with long numbers. When you open a csv there's no way to prevent excel from turning a serial number or barcode value like 64342363424634 into 6.43e13. If you remember, you can reformat the cells (custom format of 0 is the only way), but if you accidentally save the csv, excel squashes the number forever. 🤦‍♂️ I do a lot of opening and saving and reimporting of the csv that I export. I use this to force it to write out the field as #64342363424634 and then I have to remember to remove it with excelNumSafeHashPrefixRemoved. Ugly, but we're dealing with software written by boomers, for boomers here. 🤣


extension String {
    
        /// Trims whitespaces and newlines, and wraps any unsafe strings for CSV import to excel using ="stringContents", while escaping any double-quotes within the field.
    public var excelSafe: String {
            // Trim whitespaces and newlines.
        let string = self.trimmingCharacters(in: .whitespacesAndNewlines)
        
            // Any double-quotes or newlines must be escaped.
        let forbiddenCharacters = CharacterSet(charactersIn: "\"\r\n")
        
            // If the first character is an equal, minus, or plus sign, excel will treat it like a formula and display "#NAME". If the string contains double-quotes or newlines it will spill out of the cell.
        guard
            !string.hasPrefix("="),
            !string.hasPrefix("-"),
            !string.hasPrefix("+"),
            string.rangeOfCharacter(from: forbiddenCharacters) == nil
        else {
                // Note: Different versions of Excel dislike long formulae (255, 8192 characters?), but will still import and display them (with the ="..." visible in the field.
            let escapedString = string.replacingOccurrences(of: "\"", with: "\"\"")
            let truncatedString = String(escapedString.prefix(1000))
            return "=\"\(truncatedString)\""
        }
        
        return string
    }
    
    public var excelSafeOneLine: String {
        return self.replacingOccurrences(of: "\n", with: " ")
            .replacingOccurrences(of: "\r", with: " ")
            .replacingOccurrences(of: "\t", with: " ")
            .excelSafe
    }
    
    public var excelNumSafeHashPrefixed: String {
        return "#\(self)"
    }
    
    public var excelNumSafeHashPrefixRemoved: String {
        var string = self
        if string.hasPrefix("#") {
            string.removeFirst()
        }
        return string
    }
}

MrTemple avatar Mar 01 '24 01:03 MrTemple

Thanks for the detailed info! I ended up using a different library but it also has this issue so I appreciate this

aehlke avatar Mar 01 '24 02:03 aehlke