Wednesday, 3 November 2010

Beware of the Byte

Recently our test department raised a bug against one our applications that occurred when trying to insert a record into a table.

The error message encountered was a fairly innocuous "ORA-01704: string literal too long".

Following the test case to the letter, I successfully generated the same error and located the table that the APEX form was inserting into. A quick check of the Data Dictionary confirmed that the column in question was of type VARCHAR2(10). At this stage, I though the obvious cause was that there was no limit on the APEX form item (a Text Area) of 10 characters. Having checked the item in question, not only was there a “maxWidth” value of 10, the text area had been created with a “Character Counter”. Strange then how a form item accepting 10 characters was erroring whilst inserting into a column of VARCHAR2(10).

A little while later...... (after some head scratching and several discussions with our DBA’s and a colleague) the problem was all too clear. Somewhere between Database Character Sets, VARCHAR2 column definitions and non ASCII characters lay the answer.

Please forgive the rather verbose narrative but allow me to delve a little deeper.

Firstly the facts:

1. The character set of our database is set to AL32UTF8

SELECT VALUE
FROM v$nls_parameters
WHERE parameter = 'NLS_CHARACTERSET';


2. Definition of table causing the error:

CREATE TABLE nls_test
(
col1 VARCHAR2(10 BYTE)
);


3. SQL Statement causing the error:

INSERT INTO nls_test VALUES ('““““““““““');

NB: 10 individual characters.

The character used in this insert is typical of a double quote produced by MS Word (it was in fact a copy and paste from a Word document into our Text Area that caused our error).

Explanation

The reason we encountered the error was all to do with the attempt to insert a 'Multi Byte' character (a double quote from word in our case) into our table as opposed to typical single byte characters (A-Z, 0-9 etc).

Performing a simple LENGTHB to return the number of bytes this character uses demonstrated this perfectly:

SELECT lengthb('“') from dual;

LENGTHB('“')
------------
3


Because our column definition is of type VARCHAR2(10 BYTE), we are only permitted to store values that do not exceed 10 bytes in length.

Beware, 1 character in our case definitely does not = 1 byte. As already proved, our single character (a Word double quote) occupies 3 bytes so the maximum number of this Multi Byte Character we could possible insert according to our table definition is worked out simply as:

10 bytes (Column Definition) / 3 (length in bytes of our character) = 3

So whilst the APEX form item does not distinguish between single and multi byte characters and will allow you to input the full 10 characters, Oracle Database will bounce it back in our case as the total bytes in our string is 30 hence the error.

One solution suggested was to alter the Data Type to be of type VARCHAR2(10 CHAR) instead of BYTE. This in theory would force the database to respect the actual number of characters entered and not worry too much about single vs. multi byte occupancy. This would allow us resolve our immediate issue of 10 multi byte characters inserting into our table however there are further considerations.

As it turns out, even when you define your columns to use CHAR over BYTE, Oracle still enforces a hard limit of up to 4000 BYTES (given a mixed string of single and multi byte characters, it implicitly works out the total bytes of the string).

So beware that even if your column for example accepts only 3000 CHAR and you supply 2001 multi byte characters in an insert statement, it may still fail as it will convert your sting into BYTES enforcing an upper limit of 4000 BYTES.

Sorry for the really long post but it was a much for my own documentation as anything else.

5 comments:

Dominic said...

Yep, I had the same issue:

http://www.oraclemusings.com/?p=107

David said...

Duncan,

Would changing to nvarchar2(10) have worked?

-David

Duncan Mein said...

David.

I don't think so. This is from the Oracle 10g documentation:

"NVARCHAR2

The maximum length of an NVARCHAR2 column is 4000 bytes. It can hold up to 4000 characters. The actual data is subject to the maximum byte limit of 4000. The two size constraints must be satisfied simultaneously at run time."

That suggests the same issue would be encountered but I shall try it out when I get home and verify.

Duncan Mein said...

Link to Oracle Documentation which I forgot to post.

http://download.oracle.com/docs/cd/B19306_01/server.102/b14220/datatype.htm#i3253

:)

DomBrooks said...

NVARCHAR2 wouldn't solve the max 4000 bytes problem but there's no NLS_LENGTH_SEMANTICS issue with NVARCHAR2/NCHAR - it's always CHAR, BYTE not possible I believe.