glow icon indicating copy to clipboard operation
glow copied to clipboard

How to read tsv?

Open Hoeze opened this issue 3 years ago • 2 comments

I'm trying to run:

glow.transform(
    "pipe",
    input_df.limit(10),
    cmd=json.dumps(shlex.split(cmd)),
    inputFormatter='vcf',
    inVcfHeader='infer',
    outputFormatter='csv',
    out_delimiter="\t"
)

but I still only get a single column. How can I read tsv output?

Hoeze avatar Apr 25 '21 01:04 Hoeze

Hi @Hoeze! I just ran a quick test locally, and the CSV piper should still work for TSVs. However, the CSV datasource exposes many options, including whether there is a header (out_header) or there are comments (out_comment). Can you tell me more about what the command you're running?

karenfeng avatar Apr 27 '21 17:04 karenfeng

@karenfeng The issue that I had was that the option should be called outDelimiter instead of out_delimiter.

However, I'm having another issue now. For some reason, the options outNullValue and outEmptyValue do not work:

import json
import shlex

vep_transformed_df = glow.transform(
    "pipe",
    input_df.limit(10).distinct(),
#     cmd=json.dumps(shlex.split("cat | grep -v '^##'")),
    cmd=json.dumps(shlex.split(vep_cmd)),
    inputFormatter='vcf',
    inVcfHeader='infer',
    outputFormatter='csv',
#     outQuote="##",
    outHeader=True,
    outDelimiter="\t",
    outNullValue="-",
    outEmptyValue="-",
)
# vep_transformed_df.toPandas()["cDNA_position"].iloc[0]
'-'

Is there again some difference in naming?

Hoeze avatar Oct 18 '21 11:10 Hoeze

Closing since we now support only the text piper and to/from csv functions in Spark

henrydavidge avatar Mar 22 '24 03:03 henrydavidge