methplotlib icon indicating copy to clipboard operation
methplotlib copied to clipboard

bgzip, tabix and sorting

Open carolinehey opened this issue 3 years ago • 5 comments

Hi @wdecoster

I have 2 nanopolish output tsv files (calls and frequencies). I would like to use them as input for methplotlib. I have tried to sort as you wrote in an example but when i want to use tabix it gives an unsorted error.

cat <(head -n1 sample8_methylation_calls_2.tsv) <(tail -n +2 sample8_methylation_calls_2.tsv | sort -k2,2 -k3,3) | bgzip > sample8_sort_calls.tsv.gz

tabix -S1 -s1 -b3 -e4 sample8_methylation_calls_2.tsv.gz

Looking at the sorted and decompressed tsv file i see that the problem is between the plus and minus strand: image

Should it only be sorted by the 3 column and how can this be done?

carolinehey avatar Jun 01 '22 08:06 carolinehey

When i sort like this and use tabix afterwards the file does not work in methplotlib cat <(head -n1 sample8_methylation_calls_2.tsv) <(tail -n +2 sample8_methylation_calls_2.tsv | sort -k 3) | bgzip > sample8_sort_calls.tsv.gz

(methplotlib_caroline_python3) root@srvodeappgsq02:/longread/G128-2021-Imprinting-disorders/20220315_1228_1G_PAI63326_dd13a3c5# methplotlib -m sample8_sort2_calls.tsv.gz -n calls -w chr15:24,500,000-25,500,000 -f /longread/resources/reference_genomes/GRCh38_Hg38/hg38.fa Traceback (most recent call last): File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 142, in pandas._libs.index.IndexEngine.get_loc TypeError: '85285 False 85303 True 85322 True 85341 True 85358 True ... 13851 True 13873 True 13895 False 13917 False 13939 True Name: log_lik_ratio, Length: 166259, dtype: bool' is an invalid key

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/pandas/core/series.py", line 1086, in setitem self._set_with_engine(key, value) File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/pandas/core/series.py", line 1147, in _set_with_engine loc = self.index.get_loc(key) File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3628, in get_loc self._check_indexing_error(key) File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 5637, in _check_indexing_error raise InvalidIndexError(key) pandas.errors.InvalidIndexError: 85285 False 85303 True 85322 True 85341 True 85358 True ... 13851 True 13873 True 13895 False 13917 False 13939 True Name: log_lik_ratio, Length: 166259, dtype: bool

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/miniconda3/envs/methplotlib_caroline_python3/bin/methplotlib", line 8, in sys.exit(main()) File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/methplotlib/methplotlib.py", line 26, in main meth_browser(meth_data=meth_data, File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/methplotlib/methplotlib.py", line 53, in meth_browser meth_traces = plots.methylation(meth_data, dotsize=dotsize, binary=binary, minqual=minqual) File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/methplotlib/plots.py", line 106, in methylation make_per_read_meth_traces_llr(table=meth.table, File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/methplotlib/plots.py", line 183, in make_per_read_meth_traces_llr table.loc[:, "llr_scaled"] = rescale_log_likelihood_ratio(table["log_lik_ratio"].copy()) File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/methplotlib/plots.py", line 269, in rescale_log_likelihood_ratio llr[llr > 0] = scaler.fit_transform(llr[llr > 0].values.reshape(-1, 1)).tolist() File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/pandas/core/series.py", line 1128, in setitem self._set_values(indexer, value) File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/pandas/core/series.py", line 1192, in _set_values self._mgr = self._mgr.setitem(indexer=key, value=value) File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 337, in setitem return self.apply("setitem", indexer=indexer, value=value) File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 304, in apply applied = getattr(b, f)(**kwargs) File "/root/miniconda3/envs/methplotlib_caroline_python3/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 943, in setitem values[indexer] = value ValueError: shape mismatch: value array of shape (96245,1) could not be broadcast to indexing result of shape (96245,)

carolinehey avatar Jun 01 '22 08:06 carolinehey

For the calls file you should sort on the first (chromosome), third (start), and fourth (end) columns.

wdecoster avatar Jun 01 '22 09:06 wdecoster

Sorry for my ignorance, but how is this done?

And should it also be done on the frequency file?

carolinehey avatar Jun 01 '22 10:06 carolinehey

I think

cat <(head -n1 sample8_methylation_calls_2.tsv) <(tail -n +2 sample8_methylation_calls_2.tsv | sort -k1,1n -k3,3 -k4,4) | bgzip > sample8_sort_calls.tsv.gz

The frequency file has chromosome, start and end in columns 1,2,3 so that would probably look like

cat <(head -n1 sample8_methylation_freqs.tsv) <(tail -n +2 sample8_methylation_freqs.tsv | sort -k1,1n -k2,2 -k3,3) | bgzip > sample8_sort_freqs.tsv.gz

wdecoster avatar Jun 01 '22 10:06 wdecoster

Please create a separate issue for this and also provide the version of methplotlib that you are using.

wdecoster avatar Sep 01 '22 12:09 wdecoster