SUPPA
SUPPA copied to clipboard
Problems on multipleFieldSelection.py
Hi, thank you so much for sharing this helpful script for merging expression files of different samples. However, I encountered some problems when using it.
Is it possible to use more than one common field as the identifier? For instance, in my case, I have the counts of read mapping to different junctions for each sample. The columns are chrom
, start
, end
and counts
. I'd like to merge the files of all samples together, which requires the first three columns as the identifier. Is it possible to make it with this script?
Furthermore, I always get the error below when merging files with different identifiers. For example, different samples have different junctions, and I would like to keep all the junctions and set 0 to samples without the junctions.
INFO: Writing output to merge.1.txt Traceback (most recent call last): File "/scratch/prj/dtr/MultiMuTHER/scripts/txrevise/multipleFieldSelection.py", line 125, in
f.write("\t".join(line) + "\n") TypeError: sequence item 2: expected str instance, int found During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/scratch/prj/dtr/MultiMuTHER/scripts/txrevise/multipleFieldSelection.py", line 130, in print("ERROR: %s" % err) NameError: name 'err' is not defined
I have attached some files here for testing. TWPID9206_20170313.txt TWPID9206_20110217.txt TWPID9206_20140812.txt
By the way, I have tried to use csvtk join
command and merge()
in R, but they all take too much time to deal with ~1000 samples. I would really appreciate it if this script could fix it with a shorter time. Or do you recommand any other tools to deal with this problem? Thank you so much.
All the best, Meng