SDV icon indicating copy to clipboard operation
SDV copied to clipboard

ECLSK Data produces Nans and/or no correlation in HMA

Open awesomeisfree opened this issue 1 year ago • 1 comments

Environment details

If you are already running SDV, please indicate the following details about the environment in which you are running it:

  • SDV version:1.15
  • Python version:3.10.12
  • Operating System:Windows 10 (Google colab)

Problem description

When attempting to fit and synthesize data from the pub data ECLSK dataset (attached and here: https://nces.ed.gov/ecls/), several strange outcomes occur, most notably regarding the OUTCOME column, which either all comes out as one value or produces NaNs. There does not appear to be anything interesting about that column. Please advise.

What I already tried

Adjusting column dtypes, culling the dataset to fewer columns

link to colab: https://colab.research.google.com/drive/1pT81wxCReMNxam3ZP-6u3IM74R_0Czgh#scrollTo=YN16L5Ywcbou

children (1).csv ECLSKdata (1).csv schools (1).csv

awesomeisfree avatar Aug 08 '24 15:08 awesomeisfree

Hi there @awesomeisfree I apologize for the late reply here. Thank you for sharing your code and datasets; I was able to reproduce the issue you were encountering on my end when using HMA Synthesizer. We unfortunately haven't yet determined the cause of the issue yet so we will leave this issue open until we find out more.

While I know this isn't immediately helpful for you because you're using SDV Community, I will mention that this problem doesn't occur when using HSA Synthesizer (which is available in SDV Enterprise). If you want to learn more about SDV Enterprise, while we investigate the issue with HMA Synthesizer, you can reach out to us here.

Screenshot 2024-11-15 at 4 29 45 PM

srinify avatar Nov 15 '24 21:11 srinify