CSV to XML conversion in databricks which have some blank values as well in csv

Question

I am converting CSV data to xml and that CSV data has some blank values as well for a few columns

let's take an example there are 4 columns in CSV and out of that for a row(record) 1 colom value is blank , so as an output in xml, I am getting a missing XML node for that missing column record as you can see inside XML only 3 XML nodes were created
User's image

if all 4 coloum are present in csv then proper XML is created with 4 XML node.

User's image

My requirement is that is there is a blank value for any coloum then also all 4 XML node should be created and the value of that missing XML node should be blank

please let me know how can we achieve this.I am using this code for the above implementation

User's image

Regards
Manoj

Answer

Hi @Manoj ,

A small google search resulted in an answer of rluta:

If you want to output null tags, you need to provide a default nullValue that will appear in the tag:

df.write.format("xml")
    .mode("overwrite")
    .option("nullValue", "")
    .option("rowTag", "ROW")
    .save("myxml")

Original article: https://stackoverflow.com/questions/57810921/spark-xml-tags-are-missing-when-null-values-are-comingKind Regards,

Wilko

Please do not forget to "Accept the answer” wherever the information provided helps you, this can be beneficial to other community members. If you have extra questions about this answer, please click "Comment".

Share via

CSV to XML conversion in databricks which have some blank values as well in csv

1 answer