africa_poverty
africa_poverty copied to clipboard
data preprocessing
Hello, I am trying to run the project but i have encountered several issues Especially in preprocessing part.
After finish all the steps downloading part and export the Earth Engine data to google cloud storage i go to process_tfrecords notebook the main issue here is my exported earth engine file names in format like this: {country_name}{year_range}.tfrecord.gz But in notebook process_tfrecords_dhs.ipynb name should be in this type : /lx_median{year_range}_{country}_dhslocs_ee_export.tfrecord.gz
I have change the name format and moved on but last part (Process TFRecords) none of the run processing functions are working i am getting error like:
- list index out of range
- There is no such file:
for instance angola's data stored as angola2011_xx.tfrecord.gz to angola2015_xx.tfrecord.gz in cloud storage. But notebook tries to find angola2009-11.tfrecord.gz
- Cluster index not foud in tfds file: in REQUEIRED_KEYS list there is "cluster index" but some of my tfds files not inclues this.
I couldn't figure out where is the mistake or did i miss a step to create lx_median_{year_range}_{country}_dhslocs_ee_export.tfrecord.gz Can you please explain and help about this issue ? Thanks
Edit: I am inspecting the code most probably issues happens due to lacking of cluster indexes in tfrecord files. And maybe i should concatenate the tfrecords files.
Hi, repo author here. I apologize for these data preprocessing issues, which are known. I am working on creating an updated data preprocessing pipeline. See the chrisyeh96/africa_poverty_clean repo for the latest preprocessing pipeline, which should resolve your issue.
Once chrisyeh96/africa_poverty_clean is fully ready, I will merge these two repos. Hopefully I will have time to do this over the next couple of months.