africa_poverty icon indicating copy to clipboard operation
africa_poverty copied to clipboard

data preprocessing

Open iokanyalcin opened this issue 3 years ago • 1 comments

Hello, I am trying to run the project but i have encountered several issues Especially in preprocessing part.

After finish all the steps downloading part and export the Earth Engine data to google cloud storage i go to process_tfrecords notebook the main issue here is my exported earth engine file names in format like this: {country_name}{year_range}.tfrecord.gz But in notebook process_tfrecords_dhs.ipynb name should be in this type : /lx_median{year_range}_{country}_dhslocs_ee_export.tfrecord.gz

I have change the name format and moved on but last part (Process TFRecords) none of the run processing functions are working i am getting error like:

  • list index out of range
  • There is no such file:

for instance angola's data stored as angola2011_xx.tfrecord.gz to angola2015_xx.tfrecord.gz in cloud storage. But notebook tries to find angola2009-11.tfrecord.gz

  • Cluster index not foud in tfds file: in REQUEIRED_KEYS list there is "cluster index" but some of my tfds files not inclues this.

I couldn't figure out where is the mistake or did i miss a step to create lx_median_{year_range}_{country}_dhslocs_ee_export.tfrecord.gz Can you please explain and help about this issue ? Thanks

Edit: I am inspecting the code most probably issues happens due to lacking of cluster indexes in tfrecord files. And maybe i should concatenate the tfrecords files.

iokanyalcin avatar May 07 '21 15:05 iokanyalcin

Hi, repo author here. I apologize for these data preprocessing issues, which are known. I am working on creating an updated data preprocessing pipeline. See the chrisyeh96/africa_poverty_clean repo for the latest preprocessing pipeline, which should resolve your issue.

Once chrisyeh96/africa_poverty_clean is fully ready, I will merge these two repos. Hopefully I will have time to do this over the next couple of months.

chrisyeh96 avatar May 09 '21 08:05 chrisyeh96