PyVCF model.py: add_field function added in Call class, for adding or updating...

I think it could be a good improvement to allow the PyVCF module to be able to add or update information fields in Call objects, that is, in the FORMAT fields of the .vcf files. This feature could make possible to perform other interesting functions like merging of .vcf files coming from different variant callers, etc. I will be glad to include some of these functionalities in future releases.

Changes performed: model.py: add_field function added in Call class, for adding or updating a field of a call object, that is, it allow to modify calls

Dec 27 '13 10:12 gitanoqevaporelmundoentero

Thanks @gitanoqevaporelmundoentero for the pull request. I have also seen your earlier comment, but haven't had time to respond to it. Let me just link to #82 as a reference for now so the information here is complete.

Dec 27 '13 11:12 martijnvermaat

Thanks, this looks good but I'm not really willing to merge it without any tests.

Can you add some, please?

Also, idea is normalising whitespace which is a bit irritating. But it has moved comments to the wrong indentation which is a bit annoying!

Feb 06 '14 15:02 jamescasbon

Hi, @jamescasbon! I'd like to come back to this pull-request. Regarding your request on tests, I'm thinking of adding 2 vcf files from different sources (GATK and VarScan, with different INFO and FORMAT fields) and include a short script for performing the merge of these 2 files using the PyVCF library. I would include the template files and script in the "scripts" folder. Is this ok?

Please let me know your opinion.

Mar 17 '14 09:03 gitanoqevaporelmundoentero

Added test files (vct/test/gatk_sample.vcf, vcf/test/varscan_sample.vcf, also gzipped and tabix indexed). Created script for merging files (scripts/vcf_merge.py). Included example of merge command (scripts/merge_test.txt).

Mar 17 '14 10:03 gitanoqevaporelmundoentero

Hi, thanks for adding tests. I will check this out now with a view to merging. ccing @martijnvermaat in case he gets there first.

Mar 18 '14 07:03 jamescasbon

Yes, we should try to merge this.

It would take some time for me to go through it, but just skimming I see a number of issues I would like to see fixed before a merge:

Whitespace/indentation irregularities and some very long lines.
If the info in scripts/merge_test.txt is really needed, can we think of another place for it?
Use of deprecated optparse module (our other scripts already use argparse).
Can we replace the use of global variables with function arguments?
What are the sender and receivers variables for?
Many unused imports.
I think the --verbose and --debug arguments are nice during development, but probably don't belong in the resulting script.
The LICENSE: This script belongs to Sistemas Genomicos S.L. line should at least be reworded, since I don't think it's compatible with our license (or any Open Source license). I suggest to just state copyright for you and the company.

All in all this PR needs some work in my opinion.

Sorry for not giving some guidance on this earlier. We do want the feature in PyVCF and I really appreciate all the work!

Mar 18 '14 09:03 martijnvermaat

Some workflow related points:

I don't think you should have merged with current master during your work. Since we don't have any guidelines on this, I really cannot blame you (I'd rather blame Git), but I much prefer rebasing for this (or leave it to the maintainers).
The PR now consists of 24 commits, many of which could be squashed to make things more understandable for others. If you want to give it a shot, you could try Git's interactive rebase:
```
git rebase -i b8c0af7e      # this was your original master
git push origin master -f   # since you're rewriting history, you need -f
```

Perhaps we should add some contribution guidelines to our documentation, especially on the recommended Git workflow. Or just link to a good external document for this.

In the end it is not all that important though, since we could always just squash the entire PR into one commit when merging.

Mar 18 '14 09:03 martijnvermaat

Sorry for having submitted all my changes & tests without a prior exhaustive revision. I'm checking all the points you listed in your last 2 comments, and will come back with new checked changes as soon as possible.

Mar 18 '14 10:03 gitanoqevaporelmundoentero

I think I'm ready with the changes for the moment. A brief list of the changes:

I've formatted the scripts and checked them with 'pep8 --max-line-length=100'
I've removed file scripts/merge_test.txt, and included the example command in the vcf_merge.py script
I've removed not necessary elements from script, and added necessary functions
I've rebased to the original commit, I hope successfully, I'm not very used to "rebase" with git...

I will be waiting for further suggestions of changes for the PR. Thanks, Oscar

Mar 18 '14 12:03 gitanoqevaporelmundoentero

"Print" statements have been substituted to "sys.stderr.write" for compatibility with python 3

Mar 21 '14 12:03 gitanoqevaporelmundoentero

Since I haven't received any more feedback, I've decided to go through my pull request. I've made more consistent changes, regarding private methods being called from outside propietary classes and simplifying the code. To summarize:

In model.py: Record class: changes in merge function and add_call function added
In parser.py: Reader class: merge method added. Method merge_info_from_vcf_files added in parser.py I hope to get these changes included soon. Please let me know if I can collaborate in any other way.

Mar 28 '14 15:03 gitanoqevaporelmundoentero

New changes included:

parser.py: Writer class: add_attr method created, for adding infos, formats, etc.
parser.py: Writer class: init method: now this method does not write header until first record will be written, since further information may be added. Changes in _write_header and write_record were therefore needed.

Hope to hear from you soon again...

Apr 30 '14 09:04 gitanoqevaporelmundoentero

Sorry for not getting here earlier. I think there are still some issues to be resolved before this can be merged, but a full review will take me a lot of time. Let's see how we can best get there.

The main problem is the size of the pull request. I think it can be split into several smaller logically self-contained changes, which will be much easier to discuss and get merged.

Particularly, I think the main subject of this PR (adding and editing call data) can be isolated from for example the VCF merge script. The latter depends on the first, but not the other way around.

I noted two more things:

Adding and editing call data also still needs some unit tests, directly on the call data mechanics. The current test(data) in this PR works via the VCF merge script and is therefore more an integration test and too high level (but also useful).
Another thing that's not clear to me is how (or if) this approach differs from the quick mockup James gave in this comment.

I guess it will again take some work, but separating concerns will help focus the discussion. Also, I have to thank you for the updates, the code has already improved a lot.

May 14 '14 18:05 martijnvermaat

Hello, are you still interested in merging this PR? I was trying to bring some light into my changes with an git-interactive rebase, but I'm not coming through. If you are still interested, I would prefer to start from scratch in a new fork, and include there my changes. Is it ok?

Nov 25 '14 16:11 gitanoqevaporelmundoentero

Anyway I've finally performed a review of the code included in my PR, and I hope that my changes look a bit better (I've made some rebasing on them). If still interested in merging this PR, please let me know how should we continue. Thanks!

Nov 26 '14 13:11 gitanoqevaporelmundoentero

Finally I've completely rebuilt merge utilities, by creating a new class in parser: MergeReader. This class allows to parse multiple sorted vcf files at a time, returning merged records whenever it's possible. Merge script (vcf_merge.py) and parser functions get quite simplified and consistent. I've also included some unit tests for af (allelic frequencies) property. And I've also tried to rebase my commit history (although I think something is still wrong with it, any advices are welcome).

Please let me know how should I go further with this PR.

Thanks!

Nov 28 '14 16:11 gitanoqevaporelmundoentero

I've read the discussion above but am still unclear -- is there currently a way to add new fields to the genotype?

Mar 05 '15 02:03 daniellieber-claritas

Is this dead on the vine?

Oct 01 '24 13:10 MikeWLloyd

PyVCF PyVCF copied to clipboard

model.py: add_field function added in Call class, for adding or updating...

PyVCF
PyVCF copied to clipboard