Discrepancy in Table Extraction between Azure Document Intelligence GUI and Python SDK for Local Images

Sanam Khalili 0 Reputation points
2024-05-08T21:06:07.5233333+00:00

I’m using Azure Document Intelligence to analyze images that contain tables. When I use the GUI, the results correctly include the tables for all images. However, when I use the Python SDK and run the same images from my local machine, the tables are not included in the results for some of the images

In the SDK, I made one modification to the code. Instead of using the begin_analyze_document_from_url method, I used the begin_analyze_document method to read local files. Here’s the change I made:

Original SDK code:

poller = document_analysis_client.begin_analyze_document_from_url("prebuilt-layout", formUrl)

Modified code:

poller = document_analysis_client.begin_analyze_document("prebuilt-layout", data)

Could this change be causing the discrepancy in the results? If so, how can I modify my code to correctly extract tables from local images?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,431 questions
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 42,761 Reputation points Microsoft Employee
    2024-05-09T04:21:32.4666667+00:00

    @Sanam Khalili Do you see the required result if you used begin_analyze_document_from_url() method?

    If yes, then the complete data of the file might not be saved when using local file to pass to begin_analyze_document()

    Based on my experience, this might not be the case. Since GUI is showing the required result, I think the issue might be the stringIndexType parameter value with the SDK. By default, the REST API call from studio uses stringIndexType=utf16CodeUnit where as SDK uses stringIndexType=textElements this might cause this discrepancy for some documents based on its format. Try passing stringIndexType with value utf16CodeUnit to begin_analyze_document() and see if it helps. Here is the reference to this method.

    If you still see an issue with SDK response, can you add an issue on the SDK repo for python with details of the request?

    0 comments No comments