CHEUI Issue with the signal rejoining script and model 2 with the 4 dimensional issues

Hi CHEUI team,

I have a few questions about the preprocessing step.

I have finished the preprocessing for the nanopolish files with the C++ script and there are 26 outputs for that. I tried to rejoin the signal files by using the combine_split_files.py but there is an error message:

"combine_split_files: error: unrecognised arguments:"

I tried to input the directory and single file of those files and they turned out with the same message. Only the first split file seems to have no error. I can confirm that I have added headers into the split file during the preprocessing while I put the individual signal files into model 1 and they are working.

Bw, Dominic

Oct 05 '23 14:10 Domikohlh

Hi,

I want to make an update on the signalling script. I sorted the signalling arguments but they took very long to proceed so I have skipped this script because I have limited runtime. I just ran the signal chunks straight to model 1 and combine the result afterwards. I wonder does it impact the downstream performance?

Second, I have an issue with the model 2 dimension issue in m5C predictions. My input file is 122G and I broke it down into the first 50000 lines for processing and it is still not working. I have checked the format and there are 3 columns: The transcript and the cytosine sites, a series of probability and the replicate tag. I double-checked there are no extra columns or data that is not aligned properly. The model ran smoothly for the first few minutes, but it stopped in the middle and reported the dimension issue:

"W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at conv_ops_fused_impl.h:761 : INVALID_ARGUMENT: convolution input must be 4-dimensional: [1,32,99]"

I have tried adding a fourth column with '1's to fulfill the requirement, and removing the replicate tag column as the test data only has 2 columns, but the wrong format error will come up. Can you try to resolve it, please?

Dom

Oct 14 '23 02:10 Domikohlh

Hi,

Sorry about the run time issue. We are working on the faster preprocessing version and will have the update soon. I think its ok to pass chunks to model1 and combine output at the end as you mentioned. Sort them before passing to model2.

i think the issue you are getting regarding the dimension is because of TensorFlow version. What version are you using ?

Either please install the recommended version TF 2.4.1

Or

please try replacing the below line in the CHEUI_predict_model2.py code

lr_probs = model.predict(prob_vectors) to: lr_probs = model.predict(prob_vectors.reshape(prob_vectors.shape[0], 99, 1))

It should be line number 199 and 220 in CHEUI_predict_model2.py.

Please let us know if it helps.

Thanks, Akanksha

Oct 14 '23 04:10 Akanksha2511

Hi Akanksha,

Thank you so much for your reply.

That is absolutely fine and totally understandable that it takes time to develop the new preprocessing script and I will finger-cross it for the CHEUI team.

That is great! I just want to make sure the result is not disrupted even if I am not following the combined script. Regarding model 2, I added the reshaped part for the Ir_probs. I think it is working properly and does generate much longer results now.

Bw, Dominic

Oct 14 '23 07:10 Domikohlh