logo
Main

Forums

Downloads

Unreal-Netiquette

Donate for Oldunreal:
Donate

borderline

Links to our wiki:
Wiki

Walkthrough

Links

Tutorials

Unreal Reference

Usermaps

borderline

Contact us:
Submit News
Page Index Toggle Pages: 1 Send TopicPrint
Normal Topic Finding the rightmost substring and trimming whitespaces (Read 600 times)
Masterkent
Developer Team
Offline



Posts: 864
Location: Russia
Joined: Apr 5th, 2013
Gender: Male
Finding the rightmost substring and trimming whitespaces
Feb 27th, 2017 at 4:58pm
Print Post  
Finding the rightmost substring and trimming whitespaces are often needed when parsing strings. I think that such operations should be implemented natively.

Suggestion 1: add bool parameters bRightmost and bRightmostDivider to functions Core.Object.InStr and Core.Object.Divide correspondingly:

Code
Select All
native(126) static final function int    InStr  ( coerce string S, coerce string t, optional int Start, optional bool bRightmost ); 


Code
Select All
native(638) static final function bool Divide( coerce string Src, string Divider, out string LeftPart, out string RightPart, optional bool bRightmostDivider ); 


Possible underlying C++ implementation:

In FString:
Code (C++)
Select All
	// Precondition:
	//     appStrlen(**this) == Len() && appStrlen(*SubStr) == SubStr.Len()
	// Effects:
	//     if !Rightmost, returns either the position pos of the leftmost substring SubStr in this string
	//     such that pos >= Start or -1 if there is no such a position;
	//     otherwise, returns either the position pos of the rightmost substring SubStr in this string
	//     such that pos >= Start or -1 if there is no such a position.
	//
	//     Position of a substring in an enclosing string is the offset of the leftmost character of
	//     the substring from the leftmost character of the enclosing string.
	//
	INT InStr2(const FString &SubStr, INT Start = 0, UBOOL Rightmost = false) const
	{
		if (Start < 0)
			Start = 0;
		const INT StrLen = Len();
		const INT SubStrLen = SubStr.Len();
		if (StrLen - Start < SubStrLen)
			return -1;
		if (Rightmost)
		{
			if (SubStrLen == 0)
				return StrLen;
			const TCHAR *First = **this;
			const TCHAR *LeftBound = First + Start;
			for (const TCHAR *p = First + StrLen - SubStrLen; ; --p)
			{
				if (appStrncmp(p, &SubStr[0], SubStrLen) == 0)
					return static_cast<INT>(p - First);
				if (p == LeftBound)
					return -1;
			}
		}
		if (SubStrLen == 0)
			return Start;
		const TCHAR *First = **this;
		const TCHAR *p = appStrstr(First + Start, &SubStr[0]);
		return p ? static_cast<INT>(p - First) : -1;
    } 


Implementation note: when appStrlen(**this) == Len() && appStrlen(*SubStr) == SubStr.Len(), we can use fast appStrStr for the left-to-right search or quickly determine the the end of the enclosing string by calling Len and use appStrncmp for the right-to-left search. An implementation that would allow appStrlen(**this) != Len() && appStrlen(*SubStr) != SubStr.Len() would have to determine the effective string length based on either on the position of the first found null character or the result of Len(). In the first case, the entire enclosing string might have to be examined for the trailing null character even though the result could otherwise be determined much faster. In the second case, we would not be able to use appStrStr and appStrncmp that can work faster than primitive loops.

In FString:
Code (C++)
Select All
	UBOOL Divide(const FString& Divider, FString& LeftPart, FString& RightPart, UBOOL RightmostDivider = false) const
	{
		guard(FString::Divide);

		INT div_pos = InStr(Divider, RightmostDivider);
		if (div_pos < 0)
			return false;

		FString LeftPartTmp = Left(div_pos);
		FString RightPartTmp = Mid(div_pos + Divider.Len(), Len() - div_pos - Divider.Len());

		ExchangeString(LeftPart, LeftPartTmp);
		ExchangeString(RightPart, RightPartTmp);

		return true;
		unguard;
	} 


Implementation note: ExchangeString should be forward-declared. Using temporary strings allows to pass the enclosing string also as one of output strings LeftPart or RightPart. Since FString has no move assignment operator, exchanging a temporary string with a parameter may work faster than copying the string by the copy assignment operator in expressions like LeftPart = Left(div_pos). Most of the compilers would not call the copy constructor for construction of LeftPartTmp and RightPartTmp due to the return value optimization commonly used in such cases.

Suggestion 2. Add function Trim to Core.Object:

Code
Select All
native static final function Trim(out string S); 


Possible underlying C++ implementation:

In the global scope:
Code (C++)
Select All
	inline UBOOL appIsWhiteSpaceChar(TCHAR ch)
	{
		return ch == ' ' || '\t' <= ch && ch <= '\r';
	} 



In FString:
Code (C++)
Select All
	// Effects:
	//     removes all leading and trailing whitespace characters in the string
	void Trim()
	{
		if (!Len())
			return;
		const TCHAR *First = (TCHAR *)Data;
		const TCHAR *Ending = First + Len();
		while (First != Ending && appIsWhiteSpaceChar(*First))
			++First;
		while (First != Ending && appIsWhiteSpaceChar(*(Ending - 1)))
			--Ending;

		INT NewLen = static_cast<INT>(Ending - First);

		if (NewLen && First != Data)
			memmove(Data, First, NewLen * sizeof(TCHAR));

		((TCHAR *)Data)[NewLen] = 0;
		ArrayNum = NewLen + 1;
	} 

  
Back to top
 
IP Logged
 
han
Global Moderator
Unreal Rendering Guru
Developer Team
*****
Offline


Oldunreal member

Posts: 517
Location: Germany
Joined: Dec 10th, 2014
Gender: Male
Re: Finding the rightmost substring and trimming whitespaces
Reply #1 - Feb 27th, 2017 at 7:36pm
Print Post  
Why not make some explode function which returns a dynamic array or making an iterator which offers the same functionality?
  

HX on Mod DB. Revision on Steam.
Back to top
 
IP Logged
 
Smirftsch
Forum Administrator
*****
Offline



Posts: 7534
Location: at home
Joined: Apr 30th, 1998
Gender: Male
Re: Finding the rightmost substring and trimming whitespaces
Reply #2 - Feb 28th, 2017 at 7:35am
Print Post  
I think I couldn't count anymore how many different places and implementations we have in there for finding whitespaces. Wondering never ever someone had the idea for appIsWhiteSpaceChar Tongue
Perhaps should even use ::isspace nowadays.

Personally I'd prefer a Trim version.
  

Sometimes you have to lose a fight to win the war.
Back to top
WWWICQ  
IP Logged
 
Masterkent
Developer Team
Offline



Posts: 864
Location: Russia
Joined: Apr 5th, 2013
Gender: Male
Re: Finding the rightmost substring and trimming whitespaces
Reply #3 - Feb 28th, 2017 at 1:19pm
Print Post  
han wrote on Feb 27th, 2017 at 7:36pm:
Why not make some explode function which returns a dynamic array or making an iterator which offers the same functionality?

In addition to Object.Divide?

Smirftsch wrote on Feb 28th, 2017 at 7:35am:
Perhaps should even use ::isspace nowadays.

Passing anything except narrow characters or EOF to isspace would result in undefined behavior (in particular, the program may crash or produce unexpected results). For wide characters iswspace could be used instead. Both isspace and iswspace use the current locale to determine the exact set of white-space characters. I don't like such a dependency on the locale for two reasons:

1) locale-dependent implementation is a bit slower than an optimal locale-independent implementation;
2) locale-dependent behavior is less predictable and potentially may lead to unwanted results (e.g., for non-C locale, isspace is allowed to treat non-breaking spaces as white-space characters, iswspace is allowed to do so for any locale, but I wouldn't want to trim non-breaking spaces; if a non-C locale is used somewhere in the program, then switching back and forth between C and non-C locales would be error-prone and imply additional performance penalty).
« Last Edit: Feb 28th, 2017 at 3:16pm by Masterkent »  
Back to top
 
IP Logged
 
han
Global Moderator
Unreal Rendering Guru
Developer Team
*****
Offline


Oldunreal member

Posts: 517
Location: Germany
Joined: Dec 10th, 2014
Gender: Male
Re: Finding the rightmost substring and trimming whitespaces
Reply #4 - Feb 28th, 2017 at 7:50pm
Print Post  
Masterkent wrote on Feb 28th, 2017 at 1:19pm:
han wrote on Feb 27th, 2017 at 7:36pm:
Why not make some explode function which returns a dynamic array or making an iterator which offers the same functionality?

In addition to Object.Divide?

No instead of your suggested changes. But I refined the idea bit. Jus the explode function which returns a dynarray of strings, and having an additional iterator to iterate over elements in a dynamic array as this also allows these task very easily.

Also one should start to make some concepts of regexp support. I like Rubys pattern matching operator, though that would not translate 1:1 to uc conceptionally. But eventually an iterator which would iterate over matching groups would be a start.

These are far more flexible solutions compared to adding hardcoded functionality for every special string handling case.
  

HX on Mod DB. Revision on Steam.
Back to top
 
IP Logged
 
[]KAOS[]Casey
Developer Team
Betatester
Offline


nedm

Posts: 3078
Joined: Aug 7th, 2011
Gender: Male
Re: Finding the rightmost substring and trimming whitespaces
Reply #5 - Mar 1st, 2017 at 7:12am
Print Post  
regex support certainly could make a lot of things easier
  
Back to top
 
IP Logged
 
Masterkent
Developer Team
Offline



Posts: 864
Location: Russia
Joined: Apr 5th, 2013
Gender: Male
Re: Finding the rightmost substring and trimming whitespaces
Reply #6 - Mar 1st, 2017 at 11:44am
Print Post  
han wrote on Feb 28th, 2017 at 7:50pm:
Masterkent wrote on Feb 28th, 2017 at 1:19pm:
han wrote on Feb 27th, 2017 at 7:36pm:
Why not make some explode function which returns a dynamic array or making an iterator which offers the same functionality?

In addition to Object.Divide?

No instead of your suggested changes. But I refined the idea bit. Jus the explode function which returns a dynarray of strings, and having an additional iterator to iterate over elements in a dynamic array as this also allows these task very easily.

How would you rewrite the following functions using your methods?

Code
Select All
// Get filename with extension from the Path
// Example:
//     Input:
//         Path == "../Maps/SomeCustomMap.unr"
//     Returned string:
//         "SomeCustomMap.unr"
static function string GetFilenameWithExt(string Path)
{
	local int DirSeparatorPos;

	DirSeparatorPos = InStr(Path, "\\", 0, true);
	DirSeparatorPos = Max(DirSeparatorPos, InStr(Path, "/", DirSeparatorPos + 1, true));

	return Mid(Path, DirSeparatorPos + 1);
} 


Code
Select All
// Get package name, full name of the outer object, and name of the object referred to by the full name ObjectFullName
// Example:
//     Input:
//         ObjectFullName == "SomePackage.SomeGroup.SomeObjectName"
//     Output:
//         PackageName == "SomePackage"
//         OuterObjectFullName == "SomePackage.SomeGroup"
//         ObjectName == "SomeObjectName"
static function SplitObjectName(
	string ObjectFullName,
	optional out string PackageName,
	optional out string OuterObjectFullName,
	optional out string ObjectName)
{
	local int i;

	i = InStr(ObjectFullName, ".");
	PackageName = Left(ObjectFullName, i);

	if (!Divide(ObjectFullName, ".", OuterObjectFullName, ObjectName, true))
	{
		OuterObjectFullName = "";
		ObjectName = ObjectFullName;
	}
} 


Quote:
Also one should start to make some concepts of regexp support.

Good luck with that.
  
Back to top
 
IP Logged
 
han
Global Moderator
Unreal Rendering Guru
Developer Team
*****
Offline


Oldunreal member

Posts: 517
Location: Germany
Joined: Dec 10th, 2014
Gender: Male
Re: Finding the rightmost substring and trimming whitespaces
Reply #7 - Mar 1st, 2017 at 1:03pm
Print Post  
The first one I implemented prior as:
Code
Select All
static final function string LeftStrip( coerce String S, coerce String M )
{
	local int p;
	p = InStr(S,M);
	if (p == -1)
		return S;
	return Right(S,Len(S)-p-1);
}
static final function String StripPath( coerce String Text )
{
	local int NewLen, OldLen;
	NewLen = Len(Text);
	OldLen = -1;
	while ( NewLen!=OldLen )
	{
		Text   = LeftStrip(LeftStrip(Text,"/"),"\\");
		OldLen = NewLen;
		NewLen = Len(Text);
	}
	return Text;
}
 



For Explode() this would be splitting by \ and returning the last array element. Eventually before replacing all / by \.

For the second one if one does have the string split at dots by Explode(), the ObjectName and PackageName are the first and respectivly the last array entry. Rebuilding the OuterPathName can be done with a loop.
  

HX on Mod DB. Revision on Steam.
Back to top
 
IP Logged
 
Masterkent
Developer Team
Offline



Posts: 864
Location: Russia
Joined: Apr 5th, 2013
Gender: Male
Re: Finding the rightmost substring and trimming whitespaces
Reply #8 - Mar 1st, 2017 at 5:06pm
Print Post  
han wrote on Mar 1st, 2017 at 1:03pm:
For Explode() this would be splitting by \ and returning the last array element.

I don't like the idea of creating an array of strings just to use only one element in it. I wouldn't use even Divide there, because it would fill a redundant string that I don't need. If UScript calls to native functions were optimized so that passing a string to a native function would not cause creation of an unnecessary copy of the string, my variant would have a really low overhead.

"Explode" is a bad choice of function name, because it would conflict with Projectile.Explode (and, I think, it's a bit stupid name for a string operation, no matter if some popular existing implementations call it so).

Quote:
Rebuilding the OuterPathName can be done with a loop.

With Divide you don't have to worry about tricks that could be used to determine the left part of the division. This is probably the shortest implementation with your splitting function:

Code
Select All
static function SplitObjectName(
	string ObjectFullName,
	optional out string PackageName,
	optional out string OuterObjectFullName,
	optional out string ObjectName)
{
	local array<string> Names;

	Names = SplitStr(ObjectFullName, ".");
	PackageName = Array_Size(Names) > 1 ? Names[0] : "";
	ObjectName = Names[Array_Size(Names) - 1];
	OuterObjectFullName = Left(ObjectFullName, Len(ObjectFullName) - Len(ObjectName) - 1);
} 


I don't see noticeable benefits here, it's just another way to express the same thing with nearly the same readability and probably a bit worse performance.

Note also that, in general, left-to-right and right-to-left delimiter matching might produce a different string separation:

Code
Select All
// left-to-right search
Strings = SplitStrForwards("OK....", "...");
assert(Array_Size(Strings) == 2 && Strings[0] == "OK" && Strings[1] == ".");

// right-to-left search
Strings = SplitStrBackwards("OK....", "...");
assert(Array_Size(Strings) == 2 && Strings[0] == "OK." && Strings[1] == ""); 


that again raises the question about implementing the right-to-left traversal.
  
Back to top
 
IP Logged
 
Page Index Toggle Pages: 1
Send TopicPrint
Bookmarks: del.icio.us Digg Facebook Google Google+ Linked in reddit StumbleUpon Twitter Yahoo