hone icon indicating copy to clipboard operation
hone copied to clipboard

Example A not reproducible

Open cklat opened this issue 4 years ago • 4 comments

Hi!

I'm trying to replicate the examples that are illustrated in the readme but however I'm not able to replicate them.

I tried to convert the example_a.csv with the following code snippet:

import hone

optional_arguments = {
  "delimiters": [" ", ",", ";"]
}
Hone = hone.Hone(**optional_arguments)
schema = Hone.get_schema('./example_a.csv')

and the expected output should be:

[
  {
    "adopted": "TRUE",
    "adopted_since": "2012",
    "age (years)": "5",
    "birth": {
      "day": "11",
      "month": "April",
      "year": "2011"
    },
    "name": "Tommy",
    "weight (kg)": "3.6"
  },
  {
    "adopted": "FALSE",
    "adopted_since": "N/A",
    "age (years)": "2",
    "birth": {
      "day": "6",
      "month": "May",
      "year": "2015"
    },
    "name": "Clara",
    "weight (kg)": "8.2"
  },
  {
    "adopted": "TRUE",
    "adopted_since": "2017",
    "age (years)": "6",
    "birth": {
      "day": "21",
      "month": "August",
      "year": "2011"
    },
    "name": "Catnip",
    "weight (kg)": "3.3"
  },
  {
    "adopted": "TRUE",
    "adopted_since": "2018",
    "age (years)": "3",
    "birth": {
      "day": "18",
      "month": "January",
      "year": "2015"
    },
    "name": "Ciel",
    "weight (kg)": "3.1"
  }
]

but what I get is the following:

{'adopted_since': 'adopted_since',
 'adopted': 'adopted',
 'birth': {'year': 'birth year', 'month': 'birth month', 'day': 'birth day'},
 'weight (kg)': 'weight (kg)',
 'age (years)': 'age (years)',
 'name': 'name'}

so basically the cell values are not inserted in the dictionary.

Anything I did wrong or missed in the code snippet?

Thanks for the help!

cklat avatar Jun 19 '20 14:06 cklat

Hi there,

The schema that you get from Hone.get_schema just shows how hone will nest your data. It's there to give you fine-grained control over the schema in the case that you aren't satisfied with the automatically generated schema. To actually convert the CSV file to JSON, you have to apply the schema like this:

result = Hone.convert('./example_a.csv', schema=schema)

If you add that line you should get the result you're looking for. Let me know if you run into any other issues!

chamkank avatar Jun 19 '20 20:06 chamkank

Oh sorry, my bad. I should have completed the example with the whole code snippet. Now it works.

However, I encountered the following problem: When I'm editing on my mac a .csv with MS Excel, save it and reading it with hone, I get a \ufeff character at the beginning of the first column. When I save the csv file with a text editor beforehand, the character disappears. I read it's about MS putting some kind of encoding signature at the beginning of the file. Is there a way to remove this by reading it in a different way? I don't want to necessarily work with some .replace methods or so.

Thanks for your help!

cklat avatar Jun 20 '20 09:06 cklat

Oops that shouldn't be happening. Hone used to handle \ufeff but looks like changing the encoding from utf-8-sig to utf-8 broke that. I'll fix that in the next release.

chamkank avatar Jun 20 '20 22:06 chamkank

@cklat v0.2.1 should address the issue, let me know how it goes!

chamkank avatar Jun 20 '20 23:06 chamkank