cohorts
cohorts copied to clipboard
Invalid biotype error when loading effects from TCGA-BLCA cohort
I get the following error when trying to load variant effects from the TCGA-BLCA cohort
E.g, if I were to use missense_snv_count
instead of snv_count
in this notebook
This looks to me like a varcode issue, but noting it here since it does impact cohorts usability.
Here is the full traceback:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-24-4d7b83c3ab2e> in <module>()
----> 1 extra_cols, df = blca_cohort2.as_dataframe(on=[cohorts.functions.snv_count, missense_snv_count])
2 print(extra_cols)
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/cohorts/cohort.pyc in as_dataframe(self, on, col, join_with, join_how, rename_cols, keep_paren_contents, **kwargs)
335 for i, elem in enumerate(on):
336 col = elem.__name__ if not is_lambda(elem) else "column_%d" % i
--> 337 col, df = apply_func(on=elem, col=col, df=df)
338 cols.append(col)
339
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/cohorts/cohort.pyc in apply_func(on, col, df)
311 else:
312 func = lambda row: on(row=row, cohort=self, **kwargs)
--> 313 df[col] = df.apply(func, axis=1)
314 return (col, df)
315
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4059 if reduce is None:
4060 reduce = True
-> 4061 return self._apply_standard(f, axis, reduce=reduce)
4062 else:
4063 return self._apply_broadcast(f, axis)
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
4155 try:
4156 for i, v in enumerate(series_gen):
-> 4157 results[i] = func(v)
4158 keys.append(v.name)
4159 except Exception as e:
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/cohorts/cohort.pyc in <lambda>(row)
310 func = lambda row: on(row=row, **kwargs)
311 else:
--> 312 func = lambda row: on(row=row, cohort=self, **kwargs)
313 df[col] = df.apply(func, axis=1)
314 return (col, df)
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/cohorts/functions.pyc in missense_snv_count(row, cohort, filter_fn, normalized_per_mb, **kwargs)
70 patients=[cohort.patient_from_id(patient_id)],
71 filter_fn=missense_filter_fn,
---> 72 **kwargs)
73 if patient_id in patient_missense_effects:
74 count = len(patient_missense_effects[patient_id])
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/cohorts/cohort.pyc in load_effects(self, patients, only_nonsynonymous, filter_fn)
615 for patient in self.iter_patients(patients):
616 effects = self._load_single_patient_effects(
--> 617 patient, only_nonsynonymous, filter_fn)
618 if effects is not None:
619 patient_effects[patient.id] = effects
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/cohorts/cohort.pyc in _load_single_patient_effects(self, patient, only_nonsynonymous, filter_fn)
639 filter_fn=filter_fn)
640
--> 641 effects = variants.effects()
642
643 # Always take the top priority effect per variant so we end up with a single
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/varcode/variant_collection.pyc in effects(self, raise_on_error)
73 effect
74 for variant in self
---> 75 for effect in variant.effects(raise_on_error=raise_on_error)
76 ])
77
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/varcode/variant.pyc in effects(self, raise_on_error)
405 # group transcripts by their gene ID
406 transcripts_grouped_by_gene = groupby_field(
--> 407 self.transcripts,
408 'gene_id')
409
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/memoized_property.pyc in fget_memoized(self)
38 def fget_memoized(self):
39 if not hasattr(self, attr_name):
---> 40 setattr(self, attr_name, fget(self))
41 return getattr(self, attr_name)
42
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/varcode/variant.pyc in transcripts(self)
324 def transcripts(self):
325 return self.ensembl.transcripts_at_locus(
--> 326 self.contig, self.start, self.end)
327
328 @property
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/pyensembl/genome.pyc in transcripts_at_locus(self, contig, position, end, strand)
464 return [
465 self.transcript_by_id(transcript_id)
--> 466 for transcript_id in transcript_ids
467 ]
468
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/pyensembl/genome.pyc in transcript_by_id(self, transcript_id)
830 gene_id=gene_id,
831 genome=self,
--> 832 require_valid_biotype=("transcript_biotype" in field_names))
833
834 return self._transcripts[transcript_id]
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/pyensembl/transcript.pyc in __init__(self, transcript_id, transcript_name, contig, start, end, strand, biotype, gene_id, genome, require_valid_biotype)
51 raise ValueError(
52 "Invalid biotype '%s' for transcript with ID=%s, name=%s" % (
---> 53 biotype, transcript_id, transcript_name))
54
55 def __str__(self):
ValueError: (u"Invalid biotype '3prime_overlapping_ncRNA' for transcript with ID=ENST00000505579, name=AC009487.5-001", u'occurred at index 0')
I've seen this as well with ensembl85 in varcode
. @iskandr mentioned in the latest pyensembl, but we need to upgrade to the new varcode
and pyensembl
interfaces in cohorts
. I've done that in #143 so hopefully this will go away after that.
@jburos is this fixed now?
It's been running - not confirmed yet
Here's truncated output, having removed the cache:
/Users/jacquelineburos/anaconda3/envs/python27/lib/python2.7/site-packages/serializable/helpers.pyc in to_dict(obj)
188 raise ValueError(
189 "Cannot convert %s : %s to dictionary" % (
--> 190 obj, type(obj)))
191
192 @return_primitive
ValueError: ("Cannot convert SMFEPDLDHIDDGPL : <class 'Bio.Seq.Seq'> to dictionary", u'occurred at index 0')
Looks like a new bug to me. Should we mark this as closed?
I think @tavinathanson fixed this in Varcode.
@jburos yup, should be fixed with the latest varcode
.
thx. I updated varcode
& now I see the following error --
(posting a screenshot since the output is too large & crashes my computer if I try to select it):
I'm running this again & piping output to a text file, but wanted to note this here in the meantime.
.. that being said, LMK if I should close & create a new issue elsewhere. thanks for your help!
@jburos that .pyc
code looks old; can you delete all the .pyc
s and see if that helps? Are you using 0.4.0
there?