SQL Server - Question marks inserted with Hebrew collation

Natalie Sh 26 Reputation points
2021-06-22T16:29:10.737+00:00

SQL Server 2019, Developer edition, Windows 10. The client is SSMS 18.

When database is created with Hebrew collation, for example, Hebrew_CI_AI, there is no problem, Hebrew strings are inserted and being read as they are.

When database is created with Latin collation, let's say AdwentureWorks2019 with SQL_Latin1_General_CP1_CI_AS, there presumably should be possibility to create tables with columns with Hebrew collation and still Insert and Select normal Hebrew strings, right?
For example,

create table dbo.try (id int not null, name varchar(30) COLLATE Hebrew_CI_AI NOT NULL)

insert into dbo.try VALUES (1, 'שלום')

select * from dbo.try

shows question marks.

If we use UNICODE, we could Insert and Select Hebrew strings again:

create table dbo.try2 (id int not null, name varchar(30) collate Hebrew_100_CI_AI_SC_UTF8 NOT NULL)

insert into dbo.try2 VALUES (2, N'שלום')

select * from dbo.try2

But why this does not work with non-UNICODE collation? (Again, database with non-UNICODE Hebrew collation works as expected)

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,951 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,566 questions
{count} vote

Accepted answer
  1. Tom Cooper 8,466 Reputation points
    2021-06-22T19:21:31.13+00:00

    The reason the code

    Insert into dbo.try VALUES (1, 'שלום')

    works when the database has a Hebrew collation, but does not work when the database has a non-Hebrew collation, but the column has a Hebrew collation is that string literals always have the collation of the database. See https://learn.microsoft.com/en-us/sql/t-sql/statements/collations?view=sql-server-ver15 and scroll down to the REMARKS section in the paragraph numbered 3.

    So if the database collation is not a Hebrew collation, the literal value cannot contain Hebrew characters. So 'שלום' is converted to '????' in the storage for that literal. Then when the literal value is placed the the column with a Hebrew collation, the Hebrew characters are gone and the value is '????'.

    The fix is to make the literal an nvarchar literal, that is

    Insert into dbo.try VALUES (1, N'שלום')

    Tom

    2 people found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Yitzhak Khabinsky 25,211 Reputation points
    2021-06-22T17:35:03.477+00:00

    Hi @Natalie Sh ,

    Overall, there are two working scenarios:

    (1) While providing Hebrew string for a NVARCHAR() data type there is a need to prefix it with N'...'

    (2) SQL Server 2019 introduced new UTF-8 collation. It allows to use VARCHAR() data type for Hebrew.

    SQL

    DECLARE @tbl table (id INT IDENTITY, [name] NVARCHAR(30));  
    INSERT INTO @tbl ([name]) VALUES  
    (N'שלום');  
      
    SELECT * FROM @tbl;  
    

    Output

    108292-hebrew-in-ssms.png