Harmony data pre-processing and time connection
Hi,
when I run harmony to load the data with: counts = harmony.utils.load_from_csvs(csv_files, sample_names)
I found that values greater than 32767 were transformed to negative value. It is noted that the default dtype is int16 (so the max allowed value is 32767). Should the value be adjusted larger?
And if I have multiple time points, should the time connection data frame look like this? 0 1 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7
Hi @pengyu1608 - Did you ever figure out the appropriate way to code the time connections for experiments with more than two time points? Thanks!
Of course, just after asking, I was able to figure it out myself with some help with one of the other submitted issues.
The following code:
timepoints = ['6w','7w','16w','17w','18w','3mo']
timepoint_connections = pd.DataFrame(np.array([timepoints[:-1], timepoints[1:]]).T)
timepoint_connections
Where timepoints is a list of all your sample time points in order. This will generate a matrix timepoint_connections of the following form:
| _ | 0 | 1 |
|---|---|---|
| 0 | 6w | 7w |
| 1 | 7w | 16w |
| 2 | 16w | 17w |
| 3 | 17w | 18w |
| 4 | 18w | 3mo |
The matrix of this form with work in the call to aug_aff, aff = harmony.core.augmented_affinity_matrix(norm_df, tp, timepoint_connections)