CSV to XML conversion in databricks which have some blank values as well in csv

Manoj 0 Reputation points
2024-05-16T08:34:02.7366667+00:00

I am converting CSV data to xml and that CSV data has some blank values as well for a few columns

let's take an example there are 4 columns in CSV and out of that for a row(record) 1 colom value is blank , so as an output in xml, I am getting a missing XML node for that missing column record as you can see inside XML only 3 XML nodes were created
User's image

if all 4 coloum are present in csv then proper XML is created with 4 XML node.

User's image

My requirement is that is there is a blank value for any coloum then also all 4 XML node should be created and the value of that missing XML node should be blank

please let me know how can we achieve this.I am using this code for the above implementation

User's image

Regards
Manoj

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,985 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,807 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Wilko van de Velde 2,226 Reputation points
    2024-05-17T06:20:35.8533333+00:00

    Hi @Manoj ,

    A small google search resulted in an answer of rluta:

    If you want to output null tags, you need to provide a default nullValue that will appear in the tag:

    df.write.format("xml")
        .mode("overwrite")
        .option("nullValue", "")
        .option("rowTag", "ROW")
        .save("myxml")
    

    Original article: https://stackoverflow.com/questions/57810921/spark-xml-tags-are-missing-when-null-values-are-comingKind Regards,

    Wilko


    Please do not forget to "Accept the answer” wherever the information provided helps you, this can be beneficial to other community members. If you have extra questions about this answer, please click "Comment".