Harmony icon indicating copy to clipboard operation
Harmony copied to clipboard

Harmony data pre-processing and time connection

Open pengyu1608 opened this issue 5 years ago • 2 comments

Hi,

when I run harmony to load the data with: counts = harmony.utils.load_from_csvs(csv_files, sample_names)

I found that values greater than 32767 were transformed to negative value. It is noted that the default dtype is int16 (so the max allowed value is 32767). Should the value be adjusted larger?

And if I have multiple time points, should the time connection data frame look like this? 0 1 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7

pengyu1608 avatar Jul 06 '20 09:07 pengyu1608

Hi @pengyu1608 - Did you ever figure out the appropriate way to code the time connections for experiments with more than two time points? Thanks!

BenSolomon avatar Jun 01 '21 05:06 BenSolomon

Of course, just after asking, I was able to figure it out myself with some help with one of the other submitted issues.

The following code:

timepoints = ['6w','7w','16w','17w','18w','3mo']
timepoint_connections = pd.DataFrame(np.array([timepoints[:-1], timepoints[1:]]).T)
timepoint_connections

Where timepoints is a list of all your sample time points in order. This will generate a matrix timepoint_connections of the following form:

_ 0 1
0 6w 7w
1 7w 16w
2 16w 17w
3 17w 18w
4 18w 3mo

The matrix of this form with work in the call to aug_aff, aff = harmony.core.augmented_affinity_matrix(norm_df, tp, timepoint_connections)

BenSolomon avatar Jun 01 '21 05:06 BenSolomon